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Preface 


This handbook is intended to provide a broad, interdisciplinary overview of longitudinal research 
designs and longitudinal data analysis. Many of the chapters are written at an introductory level, 
to introduce readers to topics with which they may not be familiar, but some topics, unavoidably, 
require a higher level of mathematical sophistication than others. Even for those relatively more 
mathematically challenging chapters, the hope is that the reader (even the novice reader) will gain 
enough insight into the application and potential of the analytical methods to better understand 
them when they are presented, and to determine whether they would be helpful in their own 
research. It is expected that many readers will be relatively well acquainted with some of the 
methods covered in this volume, but will have much more limited exposure to others. The hope 
is that for more advanced readers, this handbook may fill in some gaps, and suggest topics worth 
further examination, particularly by placing them in a context that easily allows comparison with 
those methods that are more familiar to the reader. 

Although this handbook is not primarily designed as a textbook, it can be used as a core text 
in a course on longitudinal research design, longitudinal data analysis, or a course combining 
both topics. Sections I and II lay a foundation in longitudinal research design, and along with 
selected material from other sections, particularly from Section III, would be most suitable for a 
course emphasizing longitudinal research design. Sections IV—VII focus on different techniques 
of analysis in longitudinal research, and are perhaps broader in scope than most treatments of 
longitudinal data analysis. For a survey of longitudinal data analysis, these sections, plus some 
or all of the chapters from Section III would be appropriate. For a general introduction involving 
both design and analysis in longitudinal research, this handbook could be used as a core text, 
supplemented by other books or articles. It is anticipated, however, that this handbook will be 
most useful as a reference text for practicing researchers in the field of longitudinal research. 
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| Chapter 1 ! 


Introduction: Longitudinal research 
design and analysis 
Scott Menard 


1 Longitudinal and cross-sectional 
designs for research 


As described in Menard (2002), longitudinal 
research designs can best be understood by 
contrasting them with cross-sectional research 
designs. In a purely cross-sectional design, data 
are collected on one or more variables for a 
single time period. In longitudinal research, 
data are collected on one or more variables 
for two or more time periods, thus allowing 
at least measurement of change and possibly 
explanation of change. There are some designs 
which do not fall neatly under the definition 
of pure cross-sectional research or longitudi- 
nal research. One example is research in which 
data are collected for different times for dif- 
ferent cases, but only once for each variable, 
and the time dimension is ignored. This design 
may be used, for example, when data are not 
all available at the same time, as in Ahluwalia’s 
(1974; 1976) study of economic development 
and income inequality. Although the data come 
from more than one time period, the design 
for any given case, and also the analysis, is 
cross-sectional in nature. The danger here lies 
in assuming that relationships are constant over 
time; the alternative is that any bivariate rela- 
tionship may reflect not the relationship one 


would obtain if all of the data were measured 
for a single period, but may instead be contami- 
nated by changes in that relationship over time. 

Another possibility is a time-ordered cross- 
sectional design, in which each variable is mea- 
sured only once, but variables are, by design, 
measured at different times. An example of this 
is the study by Tolnay and Christenson (1984), 
who deliberately selected variables which were 
measured at different times for use in a causal 
path analysis of fertility, family planning, and 
development. Each variable was measured at 
the same time for all countries, but different 
variables were measured at different times, in 
order to match the temporal order of measure- 
ment with the causal order in the path model. 
Although measurement occurred at different 
times for different variables, each variable is 
measured only once for each case, and the 
data cannot be used to perform even the sim- 
plest true longitudinal analysis (e.g., measur- 
ing change in a variable from one period to 
another). Once again, the design and the ana- 
lysis are essentially cross-sectional in nature. 
Had Tolnay and Christenson chosen to postu- 
late instantaneous effects, the analysis could 
have been performed just as well with purely 
cross-sectional data. 

For the purposes of the analysis (evaluating 
direct and indirect effects of family planning 
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effort and development on fertility), this design 
is reasonable, and may have an advantage over 
models in which causal order in the path 
model and temporal order of measurement are 
not the same (Menard and Elliott 1990a). The 
use of time-ordered cross-sectional data, as in 
Tolnay and Christenson (1984), is desirable 
once temporal order has been established, but 
as described in Menard (2002), it is insuffi- 
cient to insure that one does not “predict” a 
cause from its effect. With a true longitudinal 
design and analysis, it might be possible to 
ascertain the true causal direction in the rela- 
tionship between X and Y. With cross-sectional 
data, even time-ordered cross-sectional data, 
we run the risk of undetectable misspecifica- 
tion because of incorrect causal ordering in 
the model being estimated. With longitudinal 
data, incorrect causal ordering is more likely be 
detected, and the model can be corrected. 


2 Designs for longitudinal research 


Menard (2002) describes four basic designs for 
longitudinal research: total population designs, 
repeated cross-sectional designs, revolving 
panel designs, and longitudinal panel designs. 
These designs are illustrated in Figure 1.1, and 
examples of each are provided in Chapters 2-6 
of Section I in this volume. In Figure 1.1, 
the horizontal dimension represents the period 
(a month, year, or decade) for which data are 
collected, and the vertical dimension represents 
the cases (population or sample) for which data 
are collected. Moving from left to right, vertical 
lines on the left indicate entry into the popu- 
lation or sample being analyzed, and vertical 
lines on the right indicate exit, as indicated in 
the first part of Figure 1.1. 

In a total population design, the total popu- 
lation is surveyed or measured in each period 
of the study. Because some individuals die and 
others are born from one period to the next, 
the cases are not identical from one period 
to the next, but if the periods are short, the 
overwhelming majority of cases may be the 


same from one period to the next. As one 
example, the decennial census of the United 
States attempts to collect data on age, sex, eth- 
nicity, and residence of the total population of 
the United States every ten years, and does so 
with an accuracy estimated at 95-99% (Hogan 
and Robinson 2000; Robey 1989). With some- 
what lower, but still substantial accuracy and 
completeness of coverage, the Federal Bureau of 
Investigation’s Uniform Crime Reports attempt 
to collect data on arrests for specific offenses 
and, for a limited set of offenses, crimes known 
to the police, plus the age, sex, race, and resi- 
dence (urban, suburban, or rural) of arrestees for 
all police jurisdictions in the United States. In 
Chapter 2 of this volume, Margo Anderson illus- 
trates the use of total population data, specifi- 
cally census data, for longitudinal research. To 
the extent that individual data across time are 
recoverable from the total population data, the 
total population design permits the use of all 
possible methods of longitudinal data analy- 
sis, but total population designs are most com- 
monly used in aggregate rather than individual 
level research, and more often involve ana- 
lytical techniques such as those in Chapters 
13-14 (analyzing developmental and histori- 
cal change) and Section VII (time series analy- 
sis and deterministic dynamic models), rather 
than techniques better adapted to analysis of 
change at the individual level. In addition to 
this type of analysis, which focuses on changes 
in the values of variables (e.g., changes in per 
capita gross national product or changes in 
homicide rates) over time, this type of design 
is also well suited to the analysis of changes in 
relationships among variables (e.g., the correla- 
tion between ethnicity and political affiliation, 
or between education and income) over time. 
Each of the other three longitudinal designs 
in Figure 1.1 involves a sample drawn from the 
total population, and is thus a subset of the total 
population design. The three designs differ in 
the extent to which the same or comparable 
cases are studied from one period to the next. 
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Total Population Design (Example: Census data) 


Substantial overlap across time 


«+ Exit (deaths) 


Entry (births) > 


Repeated Cross-Sectional Design (Example: NORC General Social Surveys) 


Little or no overlap across time 


Revolving Panel Design (Example: National Crime Victimization Survey) 


Partial overlap across time 


Multiple Cohort Panel Design (Example: British Cohort Studies) 
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Figure 1.1 Longitudinal designs for data collection 


This distinction has important implications for 
which types of longitudinal analysis are pos- 
sible with each design. In the repeated cross- 
sectional design, the researcher typically draws 
independent probability samples at each mea- 
surement period. These samples will typically 


contain entirely different sets of cases for each 
period, or the overlap will be so small as to be 
considered negligible, but the cases should be 
as comparable from one period to another as 
would be the case in a total population design. 
An example of the repeated cross-sectional 
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design is the General Social Surveys (GSS), 
which include an annual general population 
sample survey conducted by the National Opin- 
ion Research Center, which covers a wide range 
of topics, and emphasizes exact replication of 
questions to permit comparisons across time 
(Davis and Smith 1992). Thomas W. Smith, in 
Chapter 3, describes the GSS, including the 
methods used to collect the data, and gives 
an overview of the types of research that have 
been done using this extensive dataset. Much of 
the research involving repeated cross-sectional 
data is cross-sectional in nature, and even more 
than is the case with total population data, the 
analysis of change in repeated cross-sectional 
data may involve aggregate level research; and 
like the total population design, the repeated 
cross-sectional design is well suited to examine 
changes in values of variables and in relation- 
ships among variables over time. 

Revolving panel designs collect data on 
a sample of cases either retrospectively or 
prospectively for some sequence of measure- 
ment periods, then drop some subjects and 
replace them with new subjects. The revolv- 
ing panel design may reduce problems of 
panel mortality and repeated measurement in 
prospective studies (to be discussed in Section 
II), or problems of extended recall periods in 
retrospective studies. Retention of a particular 
set of cases over several measurement periods 
allows short-term measurement of change on 
the individual or case level, short-term analy- 
sis of intracohort developmental change, and 
panel analysis. Replacement of the subsample 
which is dropped in a measurement period with 
a new but comparable subsample of cases per- 
mits analysis of long-term patterns of aggregate 
change, similar to the analyses possible with 
total population and repeated cross-sectional 
designs. If the time lag between cause and effect 
is smaller than the time (periods) for which 
cases are retained in the sample, analysis of 
temporal and causal order is possible. The com- 
bination of longitudinal data involving repeated 


measurement on some cases with data which 
do not involve repeated measurement on others 
may permit comparisons which can indicate 
whether repeated measurement is producing 
any bias in the data (e.g., increased or decreased 
willingness to report events after either build- 
ing up some level of trust or finding out that 
reporting leads to long and tedious follow- 
up questions). A good example of a revolving 
panel design is the National Crime Victimiza- 
tion Survey, whose use in longitudinal research 
is described by Lawrence Hotchkiss and Ronet 
Bachman in Chapter 4. 

In a longitudinal panel design, the same set 
of cases is used in each period. In practice, 
there may be some variation from one period 
to another as a result of missing data. For 
example, when cases are individuals, some of 
those individuals may die between one mea- 
surement period and the next, others may 
not agree to cooperate, and others may move 
to new locations and not be found by the 
researcher. All of these are sources of panel 
attrition, and apply primarily to prospective 
panel designs, in which measurement or data 
collection occurs during more than one period 
as well as for more than one period. The com- 
bination of measurement during more than one 
period and for more than one period repre- 
sents, for some scholars, the only true lon- 
gitudinal design, the only design that allows 
the measurement and analysis of intraindivid- 
ual changes in cognitive and behavioral charac- 
teristics of individuals. The prospective panel 
design is here illustrated by Heather Joshi, 
using examples drawn from British longitudi- 
nal cohort studies, in Chapter 5. For this design, 
the techniques presented in Sections II-VI of 
this volume, but not Section VII, are generally 
appropriate. 

The analytical methods in Sections III-VI are 
also appropriate for the analysis of retrospec- 
tive panel designs, in which data collection 
may occur only once, at a single period, but 
the data are collected for two or more periods 
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(prior to or during the period in which the 
data are being collected). In retrospective panel 
designs, there may be sampling bias as a result 
of excluding respondents who have died by 
the last period for which the data are collected 
(or by the time at which the data are collected), 
or from whom data would have otherwise been 
available for earlier periods but not for the 
last period. In both retrospective and prospec- 
tive panel designs, missing data may result 
from failure of the respondent to remember 
past events, behaviors, or attitudes, or from 
unwillingness by the respondent to divulge 
some information, and also from inability of 
the researcher to locate or obtain cooperation 
from some respondents. In principle, there 
need be no difference in the quality of the 
data obtained in prospective and retrospective 
panel designs, although such differences have 
often been observed in practice. An example 
of a retrospective panel design with extensive 
attention to potential issues of data quality is 
presented in Chapter 6 by Karl Ulrich Mayer, 
using the German Life History Study (GLHS). 
As noted in Menard (2002), the designs dia- 
grammed in Figure 1.1 are not the only possible 
designs for longitudinal research. It is possi- 
ble, for example, to have a revolving sample 
in which subsamples may be dropped for one 
period, then re-included in the sample in a 
subsequent period. It is also possible to have 
a panel design in which cases are dropped, 
without replacement, after they meet some cri- 
terion (e.g., age 21). This latter design would 
result in a monotonically decreasing sample 
size which could pose problems for analysis of 
data from later years of the study (unless the 
design were further modified by replenishing 
the sample with new respondents from younger 
cohorts). The general considerations associated 
with the various designs for data collection do 
not change, however, with modifications of the 
four designs presented in Figure 1.1, and vari- 
ations on these basic designs must be evalu- 
ated in terms of their adequacy for describing 


short- and long-term historical trends (period 
effects), intercohort and intracohort develop- 
mental changes (age effects), separating age, 
period, and cohort effects, and ascertaining not 
only the strength but also the direction of causal 
influences. Total population designs can, in 
principle, be used for practically any type of 
longitudinal analysis, given a sufficient num- 
ber of cases and measurement periods. Other 
designs are more limited, and their appropri- 
ateness must be judged in the context of a par- 
ticular research problem. 


3 Measurement issues 
in longitudinal research 


Longitudinal research is subject to all of the 
concerns about measurement that arise in 
cross-sectional research, plus some issues with 
particular relevance to longitudinal research. 
Put another way, longitudinal research has all 
of the problems of cross-sectional research, 
plus a few more. In the second section of this 
handbook, the focus is on those issues most 
specifically relevant to longitudinal research. 
Skipping ahead for a moment, in Chapter 9 
Toon W. Taris discusses reliability issues in 
longitudinal research. Taris examines issues of 
distinguishing unreliability from true change, 
and raises (not for the last time in this volume) 
the issue of the reliability of change scores 
as measures of change. This is followed in 
Chapter 10 by Patterson’s discussion of one of 
the challenging issues in long-term longitudinal 
research on individual change, the possibility 
that it may be appropriate to operationalize the 
same concept in different ways across the life 
course. The issue here is that, on one hand, 
whenever we change the way we measure 
a concept in longitudinal research, if there 
appears to be a change, we cannot be certain 
whether the change results from change in the 
concept we are trying to measure, or change 
in the measurement of the concept. Yet for 
research on individuals over the life course, the 
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same measurement at different stages of the life 
course may not be validly measuring the same 
concept because different measures are appro- 
priate at different ages. In much of longitudinal 
research, there is an emphasis on consistency 
of measurement, avoiding changes in how a 
concept is measured because otherwise we can- 
not tell whether an apparent change represents 
a true change in the underlying concept or 
merely in the measurement itself. As Patterson 
explains, however, using the same operational- 
ization of the same concept over the life course 
may not always be the best approach, and the 
same underlying concept may manifest itself, 
and thus need to be measured, in different 
ways at different stages of the life course. Taken 
together, the chapters by Taris and Peterson 
address the issue of distinguishing true change 
and stability from measurement effects that 
mimic change in longitudinal research. 

The chapters by Taris and Patterson apply to 
longitudinal research in general, whether mea- 
surement is done prospectively or retrospec- 
tively. The remaining chapters deal with issues 
more specific to different types of longitudi- 
nal research. Jennifer Grotpeter in Chapter 7 
provides a general conceptual framework for 
understanding long-term retrospective recall, 
and examines the results of studies of recall as 
it is related to the length of the recall period. 
On this topic, see also Chapter 6 on the 
(retrospective) German Life History Study in 
the previous section, in which Karl Ulrich 
Mayer describes the techniques (and _ their 
results) used to enhance recall in a major ret- 
rospective panel study. Chapter 8 by David 
Cantor examines an issue specific to prospec- 
tive longitudinal research, the effect of panel 
conditioning in panel research. Panel condi- 
tioning potentially occurs when respondents 
react to previous experience of participating 
in the study by changing their behavior or 
answers, possibly in response to their percep- 
tions of what the researcher is seeking, or possi- 
bly to reduce their own burden as respondents. 


Consideration of issues specific to prospec- 
tive longitudinal panel research continues in 
Chapter 11 by Heather Laurie, who discusses 
procedures for minimizing panel attrition in 
longitudinal samples. Despite our best attempts 
to minimize panel attrition, however, circum- 
stances beyond our control (and sometimes 
beyond the control of our respondents) may 
result in missing data in longitudinal designs. 
In Chapter 12, E. Michael Foster and Anna 
Krivelyova present a brief discussion of differ- 
ent types of missing data, along with an exam- 
ple of how to handle nonignorable nonresponse 
in longitudinal research designs. 


4 Descriptive and causal analysis 
in longitudinal research 


The first stage in the process of analyzing lon- 
gitudinal data is to provide a basic description 
of the data. The chapters in Section III present 
issues and techniques which cut across dif- 
ferent types of longitudinal research designs. 
In Chapter 13, Garrett Fitzmaurice describes 
graphical techniques for presenting longitudi- 
nal data. Fitzmaurice shows how exploratory 
graphical techniques in longitudinal research 
help in providing insights prior to estima- 
tion of the model, and are also useful in the 
post-estimation diagnostic phase for examin- 
ing residuals. In Chapter 14, I review the dis- 
tinction between historical and developmental 
change and the issues involved in separating 
the two, with special attention to the disen- 
tangling of age, period, and cohort effects. In 
Chapter 15, John L. Worrall provides an intro- 
duction to pooling cross-sectional and time 
series data, a topic which will recur in other 
chapters in this handbook. In Chapter 16, 
Ronald Schoenberg describes the consequences 
of dynamic misspecification in the use of cross- 
sectional data to model dynamic processes. 
Schoenberg’s chapter indicates the conditions 
under which cross-sectional data may be ade- 
quate to model dynamic processes, and indi- 
cates the consequences of using cross-sectional 
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data when those conditions are not met. In 
Chapter 17, David Greenberg reviews attempts 
to draw causal inferences from nonexperimen- 
tal panel data, tracing the evolution of causal 
inference in longitudinal research from some 
of the earliest methodological attempts to more 
contemporary approaches. Jos W. R. Twisk in 
Chapter 18 provides a parallel consideration of 
techniques for drawing causal inferences in lon- 
gitudinal experimental research. 


5 Description and measurement 
of qualitative change 


The definitions of qualitative data and qual- 
itative change may be approached from dif- 
ferent perspectives, including how the data 
were collected, and at what level of measure- 
ment (nominal or at most ordinal for qualitative 
data). While consideration of qualitative data 
is not excluded from Section III, the focus is 
on techniques for the presentation and analy- 
sis of quantitative data. Section IV begins with 
Chapter 19, in which Johnny Saldafia describes 
an approach to the description and measure- 
ment of qualitative change in qualitative obser- 
vational research. Saldafia offers a systematic 
approach to organizing and analyzing data from 
qualitative research with an emphasis on trac- 
ing patterns of change in qualitative data. Turn- 
ing from qualitative defined in terms of method 
to qualitative defined in terms of level of mea- 
surement, Alexander von Eye and Eun Young 
Mun in Chapter 20 describe the use of config- 
ural frequency analysis for describing and ana- 
lyzing qualitative change in longitudinal data. 
In configural frequency analysis, the emphasis 
is on tracing change in nominal variables across 
multiple measurement periods to identify nor- 
mative and exceptional patterns of change. In 
Chapter 21, Catrien C. J. H. Bijleveld describes 
the use of optimal scaling techniques, typically 
calculated using alternating least squares (ALS) 
estimation, as a way of “quantifying” qualitative 


variables, and the applications of optimal scal- 
ing to the study of change in longitudinal 
research. The approaches described by Saldafia, 
von Eye and Mun, and Bijleveld are perhaps 
less well known, and typically less well cov- 
ered, than other techniques for longitudinal 
data analysis. More widespread at present, at 
least in the social and behavioral sciences, is the 
use of latent class analysis to identify different 
qualitative “types” of individuals or of patterns 
of behavioral or attitudinal change over time. 
In a companion pair of chapters, C. Mitchell 
Dayton in Chapter 22 provides an introduc- 
tion to latent class analysis, and Jeroen Ver- 
munt, Bac Tran, and Jay Magidson in Chapter 
23 describe the application of latent class mod- 
els in longitudinal research. Taken together, the 
chapters in Section III offer an array of options 
for the analysis of data that are qualitative in 
terms of the research design, the level of mea- 
surement, and the assignment of cases to latent 
qualitative classes in longitudinal research. 


6 Timing of qualitative change: 
event history analysis 


Event history analysis is not so much a sin- 
gle technique as a set of related techniques for 
describing, analyzing, and predicting the tim- 
ing of qualitative change (including whether it 
occurs at all). Section V begins with Chapter 
24 by C.M. Suchindran, in which the most 
basic models for event history analysis, life 
table models for change, are described. These 
models make minimal distributional assump- 
tions, and hence can be described as dis- 
tribution free or nonparametric methods. In 
Chapter 25, Janet M. Box-Steffensmeier and 
Lyndsey Stanfill describe the Cox proportional 
hazards model, a semiparametric technique for 
event history analysis. Parametric event history 
analysis is briefly described and illustrated by 
Hee-Jong Joo in Chapter 26. The proportional 
hazards and parametric event history analysis 
models both assume that measurements occur 


10 Handbook of Longitudindl RESGAPEPY: https:/afrilibrary.com 


fairly continuously in time. This, however, is 
not the case in much social science research, 
which may consist of measurements separated 
by a year or more. For these longer measure- 
ment intervals, discrete time event history anal- 
ysis, as described by Margaret K. Keiley, Nina C. 
Martin, Janet Canino, Judith D. Singer, and John 
B. Willett, allows for the occurrence of many 
events within a single discrete time period. 
Like parametric event history analysis, dis- 
crete time event history analysis makes cer- 
tain distributional assumptions regarding the 
parameters in the model. In contrast to the con- 
tinuous time parametric and semiparametric 
approaches, discrete time event history analysis 
works more easily with time-varying covariates 
and with multiple events occurring in a single 
time interval, and it can be implemented using 
ordinary logistic regression or related (e.g., com- 
plementary log-log regression) techniques. 


7 Panel analysis, structural 
equation models, and 
multilevel models 


The statistical techniques in Section VI are tech- 
niques primarily oriented to the analysis of lon- 
gitudinal panel data, and would probably be 
considered by some to be the most mainstream 
longitudinal analysis methods. The section 
begins with a discussion by Joseph M. Hilbe 
and James W. Hardin in Chapter 28 of the gen- 
eralized estimating equation (GEE) approach 
to the analysis of longitudinal data. The use 
of GEE involves the estimation of parameters 
and standard errors that avoids unrealistic 
assumptions of independence of observations 
in longitudinal analysis and adjusts for the 
dependencies in the data. In Chapter 29, Steven 
E. Finkel describes approaches to linear panel 
analysis with quantitative (interval and ratio 
scaled) outcome variables, and in the following 
chapter, Chapter 30, I describe the use of lin- 
ear panel analysis for the analysis of categorical 


(dichotomous, polytomous nominal, and poly- 
tomous ordinal) dependent variables, includ- 
ing the critical issue of how to measure and 
model change in categorical variables in linear 
panel models. Taken together with the chap- 
ters by Worrall (15), Greenberg (17), Twisk (18), 
and Hilbe and Hardin (28), these chapters pro- 
vide an overview of the analysis of short-term 
quantitative and qualitative change and causal 
inferences, in which the specific nature of the 
trajectories or patterns of change is typically not 
itself being modeled. 

The next three chapters turn to the model- 
ing of trajectories of change, usually over the 
relatively short term, but potentially involv- 
ing long-term trajectories as well. Michael 
Stoolmiller in Chapter 31 describes the 
latent growth curve modeling technique, based 
on structural equation modeling techniques. 
Latent growth curve models view trajectories 
or patterns of change over time as unobserved 
variables to be treated as latent variables in 
structural equation modeling. In contrast, in 
multilevel growth curve analysis of quantitative 
outcomes, as described by Douglas A. Luke in 
Chapter 32, one typically attempts to fit a man- 
ifest (not latent) polynomial or other function 
to the data to describe the trajectory of individ- 
ual cases over time, and to explain variations in 
those trajectories using a combination of time- 
invariant individual case characteristics and 
time-varying covariates. When the dependent 
variable in the multilevel analysis is categori- 
cal rather than quantitative, it may be appro- 
priate to speak not of “growth” curve analysis, 
but of multilevel change analysis. My focus in 
Chapter 33 is on showing the application of 
the logistic regression framework to the mul- 
tilevel analysis of change, and on highlighting 
some of the contrasts of multilevel change ana- 
lysis for categorical dependent variables from 
multilevel growth curve models for quantitative 
dependent variables, from event history anal- 
ysis, and from linear and logistic regression 
panel analysis. 
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8 Time series analysis and 
deterministic dynamic models 


Time series analysis stands out from the other 
methods of analysis in this handbook in the 
number of cases and the number of time 
periods. Most often, time series analysis is 
applied to aggregated data for a single case 
(a nation, city, corporation, or other aggregate 
entity, not an individual), or perhaps a handful 
of such cases, typically analyzed separately 
rather than together, as in other methods 
covered in this handbook, and the number of 
time periods is typically large, often over 100. 
In Chapter 34 I provide a brief introduction 
to time series analysis from the perspective of 
longitudinal research, which is a little different 
from the perspective out of which time series 
analysis itself has grown. Here, time series 
analysis is viewed as one tool for longitudinal 
research, with more of a focus on description 
and explanation and less of a focus on forecast- 
ing than is typical in the mainstream time series 
analysis literature. In Chapter 35, William W. S. 
Wei provides an introduction to spectral analy- 
sis, the most mathematically demanding of the 
time series analysis approaches. In Chapter 36, 
David Sanders and Hugh Ward provide further 
details on alternative approaches to time series 
analysis with one or more predictors included 
in the model, and offer a useful comparison of 
the different approaches to time series analysis 
to the empirical study of public opinion in 
political science. 

The final two chapters in Section VII also 
involve a higher level of mathematical sophis- 
tication than most of the other chapters in 
this handbook. Steven M. Boker in Chapter 37 
describes the application and estimation of 
differential equation models in longitudinal 
research using a latent variable structural equa- 
tion modeling approach to estimate the para- 
meters of the differential equation model. 
Finally, Courtney Brown in Chapter 38 pro- 
vides a brief introduction to the application 


of nonlinear dynamics, chaos, and catastrophe 
theory to the study of change. 


9 Conclusion 


One of the goals of this handbook is to make 
the reader aware of the richness and breadth of 
research design and analytical techniques avail- 
able for longitudinal research. The first section 
of this handbook begins with strong exam- 
ples of each of the major types of longitudinal 
research design. Section II focuses on measure- 
ment issues that arise in longitudinal research 
generally, and also more specifically in par- 
ticular types of longitudinal research designs. 
With each of these designs, the number of cases 
and periods may vary, and as a result of this 
variation, different methods of analysis may be 
appropriate. The number of cases is in princi- 
ple independent of the type of design. In a total 
population design, for example, at the individ- 
ual level, the total population of a tribal society 
may number fewer than 100. In aggregate anal- 
ysis, a cohort or a population, rather than its 
individual members, may be the unit of analy- 
sis, and the number of these aggregate units may 
be small. At the other end of the continuum, 
the revolving sample in the National Crime Vic- 
timization Survey includes over 100,000 indi- 
viduals from 60,000 households. All of these 
possible combinations of type of design and 
number of cases are included within the broad 
category of longitudinal research. 

The number of cases and the number of time 
periods, in turn, drives the choice of analyti- 
cal methods. With no more than a handful of 
cases but many time periods, the time series 
and deterministic dynamic models in Section 
VII are most appropriate. With no more than 
a handful of time periods but many cases, 
panel analytic techniques described in Section 
VI may be best, and as the number of time 
periods increases up to ten or so, techniques 
such as latent and multilevel growth curve and 
change models in Section VI, event history 
analysis in Section V, and the techniques for 
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qualitative data analysis in Section IV become 
increasingly feasible. In the best of all pos- 
sible worlds for longitudinal research, many 
cases and many time periods, event history 
and multilevel growth curve and change mod- 
els seem at present to offer the best options. 
It is hoped that, by presenting in some detail 
the different designs for longitudinal research, 
issues in longitudinal research design, and tech- 
niques of analysis for longitudinal data, all in a 
single sourcebook, readers will be increasingly 
aware of and better able to make informed selec- 
tions among the different options available to 
best capitalize on the strengths of longitudinal 
research. 
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| Chapter 2 1 


Using national census data 
to study change 


Margo Anderson 


1 Introduction 


A census is generally defined as an official 
count conducted by a national government of 
a country’s population, economic activity or 
other national phenomena, such as religious 
institutions. A population census determines 
the size of a country’s population and the char- 
acteristics of its people, such as their age, sex, 
ethnic background, marital status, and income. 
An economic census collects information on the 
number and characteristics of farms, factories, 
mines, or businesses. As state sponsored data 
collections, censuses are primarily designed for 
governmental and public policy use. The aca- 
demic research community uses census data 
for grounding much social science research, 
including providing sampling frames for peri- 
odic surveys, for classification schemes and 
standard and consistent questionnaire design 
and wording, and for comparative analysis, 
both spatial and temporal. Censuses are official, 
public, repeated, infrequent, and comprehen- 
sive sources of reliable and relatively simple 
information on a society and also serve as a 
fundamental longitudinal data source for social 
science. 

Most countries of the world currently 
conduct population censuses at regular inter- 
vals. By comparing the results of successive 


censuses, analysts can see whether the popu- 
lation is growing, stable, or declining, both in 
the country as a whole and in particular geo- 
graphic regions. They can also identify general 
trends in the characteristics of the population. 
Censuses are “complete counts” not samples of 
the phenomenon under study, and accordingly 
they are very expensive. They require elab- 
orate administrative operations and thus are 
conducted relatively infrequently. The United 
States, for example, conducts a population cen- 
sus every ten years (a decennial census), and 
Canada conducts one every five years (a quin- 
quennial census). Economic censuses are gener- 
ally conducted on a different schedule from the 
population census, generally every five years. 

Censuses of population usually try to count 
everyone in the country as of a fixed date, 
often known as Census Day. Generally, gov- 
ernments collect the information by sending 
a questionnaire in the mail or a census taker 
to every household or residential address in 
the country. The recipients are instructed to 
complete the questionnaire and send it back to 
the government, which processes the answers. 
Trained interviewers visit households that do 
not respond to the questionnaire and individu- 
als without mail service, such as the homeless 
or those living in remote areas. 
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Censuses require significant public coopera- 
tion for their operational success since the goal 
of the census is a snapshot or cross-sectional 
information collected at a point in time. The 
responding public needs to know the census 
is coming, be aware of the responsibility to 
respond accurately and promptly, and be will- 
ing to cooperate. Census questions are designed 
to be simple and intelligible for the entire pop- 
ulation under study. For the society in ques- 
tion, the questions can not be controversial or 
ambiguous, or the quality of the responses dete- 
riorates rapidly. 


2 Official uses of census 
information 


Governments use census information in almost 
all aspects of public policy. In some coun- 
tries, the population census is used to deter- 
mine the number of representatives each area 
within the country is legally entitled to elect 
to the national legislature. The Constitution of 
the United States, for example, provides that 
seats in the House of Representatives should be 
apportioned to the states according to the num- 
ber of their inhabitants. Each decade, the US 
Congress uses the population count to deter- 
mine how many seats each state should have 
in the House and in the electoral college, the 
body that nominally elects the president and 
vice president of the United States. This pro- 
cess is known as reapportionment. States fre- 
quently use population census figures as a 
basis for allocating delegates to the state leg- 
islatures and for redrawing district boundaries 
for seats in the House, in state legislatures, and 
in local legislative districts. In Canada, census 
population data are similarly used to appor- 
tion seats among the provinces and territories 
in the House of Commons and to draw electoral 
districts. 

Governments at all levels—such as cities, 
counties, provinces, and states—find popula- 
tion census information of great value in plan- 
ning public services because the census tells 


how many people of each age live in different 
areas. These governments use census data to 
determine how many children an educational 
system must serve, to allocate funds for public 
buildings such as schools and libraries, and to 
plan public transportation systems. They can 
also determine the best locations for new roads, 
bridges, police departments, fire departments, 
and services for the elderly, children, or the 
disabled. 


3 Public and research use of census 
information 


The official uses of census information do not 
exhaust the use of the data. Private researchers, 
including businesses and marketing organiza- 
tions and the media analyze population and 
economic census data to determine where 
to locate new factories, shopping malls, or 
banks; to decide where to advertise particular 
products; or to compare their own production 
or sales against the rest of their industry. Com- 
munity organizations use census information 
to develop social service programs and service 
centers. Censuses make a huge variety of gen- 
eral statistical information about society avail- 
able to researchers, journalists, educators, and 
the general public. 

In addition to these immediate uses, the aca- 
demic research community makes fundamental 
use of census data for grounding much social 
science research, including providing sampling 
frames for periodic surveys, for classification 
schemes and standard and consistent ques- 
tionnaire design and wording, and for com- 
parative analysis, both spatial and temporal. 
The national census, in short, is a fundamen- 
tal building block for social science research. 
Its particular character as an official, pub- 
lic, repeated, infrequent, and comprehensive 
source of reliable and relatively simple infor- 
mation on a society in turn shapes both the 
research uses to which it can be put, and, more 
significantly, the basic framework for social 
science. 
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Finally, the census is both one of the oldest 
survey formats for social science information, 
and in a remarkable number of cases, the his- 
torical data, including original questionnaires, 
records of administrative procedures, as well as 
the publicly released results, have survived and 
are available for current use. The availability, 
therefore, of repeated collections of what were 
originally cross-sectional data collections per- 
mits sophisticated longitudinal analysis of the 
information in the censuses. 


4 The census as a source for 
longitudinal analysis 


Censuses, as official and public sources of infor- 
mation, serve many masters. Chief among those 
masters are the governmental or official inter- 
ests that fund the data collection and make ini- 
tial use of the results. The official grounding 
of census data collections in turn guarantees 
that in nations committed to collecting and dis- 
seminating census data, researchers can depend 
upon the quality, public availability, and con- 
sistency of the data. For example, in the United 
States for over two centuries, researchers have 
been able to count upon the release of tabu- 
lated population data from the census. Since 
the 1960s, they have been able to count upon 
the availability of public use micro data sam- 
ples from the complete count (PUMS files) 
for additional research use. Interestingly, how- 
ever, for the researcher intent upon using cen- 
sus information for longitudinal analysis, the 
very strengths of the data source, e.g., accessi- 
bility, comprehensiveness, and long temporal 
run of data, also necessarily present signifi- 
cant methodological issues that must be tack- 
led before one can make effective use of the 
data. In other words, the strengths of this data 
source are inextricably tied to its weaknesses, 
complications, and frustrations. Thus a signif- 
icant methodological literature exists to guide 
the researcher through the minefield of method- 
ological problems and facilitate analysis, and it 
is to these issues that we now turn. 


5 Data availability 


Longitudinal analysis requires repeated col- 
lection of consistent data over time. For a 
researcher intending to use census data, his or 
her first task is to determine when the censuses 
were taken and the amount and type of data pre- 
served and available for current use. Population 
censuses are fundamentally an activity of mod- 
ern states in the West, and date primarily from 
1800 and later. Researchers interested in lon- 
gitudinal analysis of population and economic 
relationships before 1800 must identify addi- 
tional sources with a longer collection history. 
These include, for example, parish registers, 
occasional intermittent censuses, or adminis- 
trative data, e.g., records of tax collections, or 
the heights, weights, and personal information 
collected of military recruits. 

By the middle of the twentieth century, gov- 
ernments around the world instituted periodic 
censuses and the United Nations has summa- 
rized this information in its annual publica- 
tion, the Demographic Yearbook, and maintains 
links to national statistics offices on its web- 
site. The United States Census Bureau website 
also has links to the national statistical offices 
of most nations for the researcher interested in 
finding census data on a particular country or 
region. 

Once a researcher has determined that the 
historical census data were collected for the 
time period of interest, he or she will face an 
additional set of technical challenges. The first 
question of importance is whether the longi- 
tudinal data file to be created is made up of 
aggregate or micro level data. 


6 Aggregate and microdata 


6.1 Aggregate 


The official agencies that collect the individual 
level census information generally publish 
tabulations of the results in the years imme- 
diately after the count. These data, when 
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aggregated for multiple censuses across time, 
can provide the aggregate time series of basic 
trends for the geographic area, industry, or 
population subgroup of interest. Researchers 
must then determine if the reporting categories 
were consistent for a number of points in 
time. At the simplest level, one can prepare 
a time series of aggregate population for a 
country, or for basic reporting categories, 
e.g., for the number of males or females in 
a nation’s population. National statistical 
agencies tend to publish such trend data in 
the main or supplementary publications of the 
census or in retrospective compilations. See 
Figures 2.1 and 2.2, which report time series 
results in tabular and graphic form for the total 
United States population and for several basic 
demographic variables. (Additional detail is 
available at “Selected Historical Decennial 
Census Population and Housing Counts,” 
http://www.census.gov/population/www/cen- 
susdata/hiscendata.html.) 

Retrospective compilations provide more 
elaborate time series, and also include technical 
analyses explaining the compilation of individ- 
ual series. These compilations require substan- 
tial effort to revisit the quality of individual 
data points in the series, explain breaks and 
ambiguities, and trace the development of clas- 
sifications and concepts. See, for example, His- 
torical Statistics of the United States (HSUS), 
US Bureau of the Census 1976. The most recent 
revision of the US compilation is an ambitious 
scholarly collaboration (Susan Carter, et al., 
2006). 

For more complex analyses, reporting cate- 
gories must remain consistent from census to 
census. The researcher may have to prepare the 
time series by retrieving each data point in the 
time series from each year’s published census 
volume or table. For example, if a researcher 
is interested in aggregate population change for 
cities for a nation, he or she will face the issues 
that the number of cities tends to grow over time 


and that the geographic boundaries of cities also 
change. 

To use a simple example, the US census 
has reported the population of New York City 
since 1790. The boundaries of the modern 
city of five boroughs (Manhattan, Brooklyn, 
Queens, Bronx, Richmond (Staten Island)) were 
defined in 1898. Thus the researcher must 
decide whether to include in the series com- 
parative data for the period from 1790 to 1890, 
or whether to report only the modern bound- 
aries. See Table 2.1 and Figures 2.3 and 2.4 
which illustrate the issues. Before the 1900 
census, New York City included New York 
County (Manhattan) and a small portion of 
Westchester County (a portion of what is now 
the borough of the Bronx), which was annexed 
to the city in the 1870s. If the researcher uses 
the modern boundaries, the city’s population 
shows a huge jump between 1890 and 1900, 
primarily because the creation of Greater New 
York annexed the nation’s fourth largest city 
(Brooklyn) to New York City. The remaining 
annexed areas of Staten Island, Queens, and 
the Bronx grew rapidly in later years. More 
complex aggregation decisions occur when the 
geographic boundary changes are more com- 
plex, or when the researcher wishes to aggre- 
gate on demographic or economic variables, as 
discussed below. 

In short, researchers interested in preparing 
aggregate longitudinal data series are depen- 
dent upon the reporting and publication deci- 
sions of the national statistical agencies that 
compiled the data. If the original data were 
not tabulated and reported on a researcher’s 
category of interest, the data series cannot be 
created from the published results. In the face 
of this problem, many researchers have turned 
to accessing micro level data from past cen- 
suses with the goal of aggregating the results 
into categories of interest, or using the micro- 
data directly for analysis. These data in turn 
present different challenges for longitudinal 
analysis. 
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US Census Bureau 


Resident Population of the United States 


United States Resident Population, 


1790-2000 


Resident 

Year population 

300,000,000 
2000 | 281,421,906 eee 
1990 | 248,709,873 “e 
1980 | 226,542,199 200,000,000 
1970 | 203,302,031 150,000,000 
1960 | 179,323,175 100,000,000 
1950 | 151,325,798 

50,000,000 + 


1940 | 132,164,569 


1930 | 123,202,624 
1920 | 106,021,537 41 
1910 92,228,496 


490 
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1900 | 76,212,168 
1890 | 62,979,766 
1880 | 50,189,209 
1870 | 38,558,371 
1860 | 31,443,321 
1850 | 23,191,876 
1840 | 17,063,353 
1830 | 12,860,702 
1820 9,638,453 
1810 7,239,881 
1800 5,308,483 
1790 3,929,214 


Figure 2.1 United States Census Bureau population growth with graph, 1790 to 2000, http://www.census. 


gov/dmd/www/resapport/states/unitedstates.xls 


6.2. Microdata 


Longitudinal analysis of microdata census 
results are possible only if the collecting agency 
has preserved the individual responses either 
on the original paper schedules, on microfilm 
or related media, or in electronic form. The 
researcher also needs to be able to gain access 
to these data for research use. That is, the 
country under study must provide a mecha- 
nism to make the data available. In general, 
national statistical agencies have confidential- 
ity and privacy protections which prevent non- 
official access to the individual level responses 


to the census questions, e.g., to the microdata. 
These strictures have been in place since the 
early twentieth century to prevent abuse of the 
individual respondent’s privacy rights. Though 
rules for research access to microdata from past 
censuses vary by country, many countries will 
preserve confidentiality for a period of years, 
and then allow researcher and public access. 
For the United States, for example, the individ- 
ual level information on the population census 
schedules is protected from public access for 
72 years after the census is taken. Thus in the 
United States, researchers can access microfilm 


Table 2. 


Population, Housing Units, Area Measurements, and Density: 1790 to 1990 


[For information concerning historical counts, see "User Notes." Density is computed using land area. For definitions of terms and meanings of symbols, see text 
Population Housing units Area measurements Density 
United States Change Trom preceding Change trom preceding Total area Land area Population per— Housing units per— 
Square kilo- Square kilo- Square Square Square Square 

Total Number — Percent Total Number Percent meters Square miles meters Square miles | kilometer mile | kilometer mile 
1990 (Apr. 1) 248 709 873 | 22 167 674 9.8 | 102 263 678 | 13 853 051 15.7 9 809 155 3 787 319 9 159 116 3 536 338 27.2 70.3 11.2 28.9 
1980 (Apr. 1) 1226 542 199 | 23 240 168 11.4 | 188 410 627 | 19 706 312 28.7 9 372 614 3 618 770 9 166 759 3 539 289 24.7 64.0 9.6 25.0 
1970 (Apr. 1) 203 302 031 | 23 978 856 13.4 68 704 315 | 10 377 958 178 9 372 614 3 618 770 9 160 454 3 536 855 22.2 57.5 75 19.4 
1960 (Apr. 1) 179 323 175 | 27 997 377 18.5 58 326 357 | 12 189 281 26.4 9 372 614 3 618 770 9 170 959 3 540 911 19.6 50.6 64 16.5 
1950 (Apr. 1) 151 325 798 | 19 161 229 14.5 46 137 076 8 698 362 23.2 9 372 614 3 618 770 9 200 214 3 552 206 16.4 42.6 5.0 13.0 
1940 (Apr. 1) 132 164 569 8 961 945 73 37 438 714 Fee Red 9 372 614 3 618 770 9 206 435 3 554 608 14.4 37.2 44 10.5 
1930 (Apr. 1) 123 202 624 | 17 181 087 16.2 9 372 614 3 618 770 9 198 665 3 551 608 13.4 34.7 
1920 (Jan. 1) 106 021 537 | 13 793 041 15.0 9 372 614 3 618 770 9 186 551 3 546 931 11.5 29.9 
1910 (Apr. 15) 92 228 496 | 16 016 328 21.0 9 372 614 3 618 770 9 186 847 3 547 045 10.0 26.0 
1900 (June 1 76 212 168 | 13 232 402 21.0 9 372 614 3 618 770 9 187 543 3 547 314 8.3 21.5 
1890 (June 1 62 979 766 | 12 790 557 25.5 9 355 854 3 612 299 9 170 426 3 540 705 69 178 
1880 (June 1 = 50 189 209 | 11 630 838 30.2 9 355 854 3 612 299 9 170 426 3 540 705 5.5 14.2 
1870 (June 1 . 38 558 371 7 115 050 22.6 9 355 854 3 612 299 9 170 426 3 540 705 42 10.9 
1860 (June 1) 2 31 443 321 8 251 445 35.6 7 825 154 3 021 295 7 691 368 2 969 640 44 10.6 
1850 (June 1) 2 23 191 876 6 122 423 35.9 7 748 386 2 991 655 7 614 709 2 940 042 3.0 79 
1840 (June 1) 17 069 453 4 203 433 32.7 4 642 710 1 792 552 4 531 107 1 749 462 38 9.8 
1830 (June 1) 2 12 866 020 3 227 567 33.5 4 642 710 1 792 552 4 531 107 1 749 462 28 74 
1820 (Aug. 7) er 9 638 453 2 398 572 33.1 4 642 710 1 792 552 4 531 107 1 749 462 21 5.5 
1810 (Aug. 6) '- 7 239 881 1 931 398 36.4 4 461 754 1 722 685 4 355 935 1 681 828 17 43 
1800 (Aug. 4) = 5 308 483 1 379 269 35.1 2 308 633 891 364 2 239 692 864 746 24 6.1 
1790 (Aug. 2) 7 3 929 214 eee a 2 308 633 891 364 2 239 692 864 746 1.8 45 


2 UNITED STATES SUMMARY 


TIPSII [UPF] GPH21 CENSUS90 71583900 08/27/93 11:03 AM MACHINE: C DATA:CENSUS90*PH21TIPSDAOO. 08/26/93 14:50:22 TAPE: NONE FRAME: 2 
TSF:CENSUS90*92. 08/26/93 14:51:46 UTF:CENSUS90*93. 08/26/93 14:51:47 META:CENSUS90*PH21TABLESO00. 08/26/93 15:23:55 


Figure 2.2 United States Census Bureau historical population tables, http://www.census.gov/dmd/www/resapport/states/united 


states.xls 


POPULATION AND HOUSING UNIT COUNTS 
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Table 2.1 New York City population, 1790-2000, breakdown by boroughs 


Year Total Bronx Brooklyn Manhattan Queens Staten Is Old 
NYC 
1 1790 33131 - 4495 - - 3835 33131 
2 1800 60489 - 5740 - = 4564 60489 
3 1810 96373 = 8303 - - 5347 96373 
4 1820 123706 = 11187 = = 6135 123706 
9) 1830 202589 - 20535 - = 7082 202589 
6 1840 312710 - 47613 - - 10965 312710 
7 1850 515547 _ 138882 a _ 15061 515547 
8 1860 813669 = 279122 - 30249 25492 813669 
9 1870 942292 - 419921 - 41669 - 942292 
10 1880 1206299 - 599495 - 52927 - 1206299 
11 1890 2507414 88908 838547 1441216 87050 51693 1515301 
12 1900 3437202 200507 1166582 1850093 152999 67021 1850093 
13 1910 4766883 430980 1634351 2331542 284041 85969 2331542 
14 1920 5620048 732016 2018356 2284103 469042 116531 2284103 
15 1930 6930446 1265258 2560401 1867312 1079129 158346 1867312 
16 1940 7454995 1394711 2698285 1889924 1297634 174441 1889924 
17 1950 7891957 1451277 2738175 1960101 1550849 191555 1960101 
18 1960 7781984 1424815 2627319 1698281 1809578 221991 1698281 
19 1970 7894862 1471701 2602012 1539233 1986473 295443 1539233 
20 1980 7071639 1168972 2230936 1428285 1891325 352121 1428285 
21 1990 7322564 1203789 2300664 1487536 1951598 378977 1487536 
22 2000 8008278 1332650 2465326 1537195 2229379 443728 1537195 


Source: New York City Department of City Planning, Change in Total Population, 1990 and 2000, New York City and Bor- 
oughs, http://www.ci.nyc.ny.us/html/dcp/html/census/pop2000.shtml. Ira Rosenwaike, Population History of New York City 
(Syracuse, NY: Syracuse University Press, 1972); United States Census Bureau, ‘Population of the 100 Largest Cities and Other 
Urban Places in the United States: 1790 To 1990’, http://www.census.gov/population/www/documentation/twps0027.html 
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Figure 2.3. New York City population, 1790-2000 


copies of the population schedules for the cen- 
suses from 1790 to 1930 (with the exception of 
the schedules for the 1890 census which were 
burned in a fire in the early 1920s). The Cana- 
dians release data after 92 years. 

Starting with the 1960 census, the US Census 
Bureau also prepared a public use file, a 1% 
or 5% sample of the complete count. Data in 
the samples are coded to protect confidentiality 
such that no individual’s information can be 
identified in the file prior to the 72-year limit. 
To prevent disclosure, the agency restricts the 
geographic detail available on the cases of the 
sample. 

Since the 1970s, with grant funding chiefly 
from the National Institutes of Health and 
the National Science Foundation, a series of 
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Figure 2.4 New York City population, 1790-2000, 
breakdown by boroughs 


researchers, most notably the demographic 
historians at the Minnesota Population Center 
at the University of Minnesota, have created 
public use files of past US censuses for 1850- 
1880, and 1900-1950, and have developed 
standardized coding and web-based file deliv- 
ery systems to create “integrated public use 
microdata samples” for the US population 
censuses from 1850 to 2000. These data, with 
documentation, bibliographies of research 
use, facsimiles of original questionnaires, and 
technical papers, are available at the IPUMS 
website, http://www.ipums.org. Other nations 
have also prepared public use files of their 
censuses. For a current inventory of population 
censuses with information on the current 
knowledge about surviving microdata files, see 
http://www.ipums.org/international/microdata 
_inventory.html. The Minnesota Population 
Center has also developed an “international 
integrated public use microdata sample” 
project or IPUMS International, http://www. 
ipums.org/international/ index.html to develop 
a web-based harmonized data delivery system 
similar to that developed for the United States 
population censuses. These microdata files 


Table 2.2 Availability of data points in historical 
PUMS files 


Country Data points Span of data 
Norway 12 1801-2001 
United Kingdom 7 1851-2001 
Canada 12 1871-2001 
Argentina 7 1869-2001 
Finland 9 1950-2001 
United States 15 1850-2000 


Source: Integrated Public Use Microdata Series Inter- 
national, Census Years and Microdata Inventory, April 
1, 2006, http://www.ipums.org/international/microdata_ 
inventory.html 


almost all date from 1960 or later and thus 
include only data points from four to five 
censuses. Only a few nations (Table 2.2) have 
extant microdata files from before 1960. The 
United States has the most with 15. Norway 
has the longest data run, with the early census 
data from 1801. One can expect that in the 
years to come additional files will be created, 
as the promise of developing such longitudinal 
series has proved itself in the past 15 years. 


7 Questions 


7.1 Availability 


A second methodological issue facing 
researchers using longitudinal census data is 
the availability and character of the questions 
asked and reported over time. Broadly speak- 
ing, the number of questions and the amount 
of detail collected increases over time so that 
more information is available in current census 
reports than in older ones. For example, the 
first census of the United States (1790) asked 
only for the name of the household head and 
for five additional pieces of information for 
the household (the number of free white males 
16 and over; the number of free white males 
under 16; the number of free white females; the 
number of slaves, and the number of other free 
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people). By 1850, each person in the United 
States was identified on a separate census 
line. The census asked 13 questions of the free 
population and 8 for the slave population. The 
resulting data provided detailed information on 
the address, household relationships, age, sex, 
race, nativity, occupation, educational, marital 
and disability status, and property owned. 
See http://www.ipums.org/usa/Vvoliii/items 
1850.html for the 1850 schedule for the free 
population and http://www.ipums.org/usa/ 
voliii/(Enum Form.shtml for access to all the 
forms from 1850 to 2000. For the facsimiles 
of the slave schedules, and for pre-1850 
census schedules, see US Bureau of the 
Census, 1979; US Bureau of the Census, 
1973. See also United States Census Bureau, 
“Selected Historical Decennial Census Popu- 
lation and Housing Counts,” http://www. 
census.gov/population/www/censusdata/ _his- 
cendata.html, for links to forms and enumerator 
instructions from 1790 to 2000. 

In 2000, respondents to the US census 
answered six basic questions: name and 
address, household relationship, age and place 
of birth, sex, race and ethnicity, and housing 
tenure (whether the dwelling was owned or 
rented). A one in six sample of households 
received a “long form” questionnaire with a 
total of 53 questions on demographics, housing 
conditions, occupation and income, migra- 
tion, citizenship and languages spoken, educa- 
tional attainment, disability status. For copies 
of the forms, see http://www.census.gov/dmd/ 
www/2000quest.html. 

The IPUMS website, www.ipums.org, and 
Twenty Censuses and Population and Hous- 
ing Inquiries in U.S. Decennial Censuses, 
1790-1970 (US Bureau of the Census, 1979; 
1973) provide tables listing the availability 
of particular American census questions over 
time. The IPUMS site also lists coding schemes 
for answers over time. Thus a researcher can 
determine if data were collected consistently 
on a particular item of interest, and how the 


question was framed and responses recorded at 
each census. 

Some basic questions, e.g., age and sex, 
are relatively straightforward and thus the 
responses can be assumed to be historically 
consistent. The basic questions and the cod- 
ing schemes are fundamentally consistent over 
time though even with such basic questions, 
methodological problems, e.g., age heaping, can 
exist in the original data. Other questions, even 
such seemingly transparent questions such as 
race, place of birth or educational attainment, 
are not, on closer inspection. Three types of 
issues emerge: conceptual change in the phe- 
nomenon measured; changes in classification 
schemes; and level of detail in the answers. 
A number of examples illustrate the issues. 


7.2 Changes in conceptualization 
of a phenomenon 


Longitudinal researchers are often interested 
in the origins of a phenomenon, asking when 
something started, or, relatedly, when it ended. 
For census questions, the measurements of 
economic and educational status illustrate the 
issues. Mass public education is an innovation 
of the nineteenth century in modern societies, 
and thus for some older censuses, the earliest 
questions on the educational characteristics of 
the population asked a yes/no question about 
literacy, e.g., whether someone could read or 
write, or was literate in a particular language. 
By the twentieth century, the question changed 
to one of educational attainment, e.g., the num- 
ber of years of schooling for an individual. 
Similar changes occurred in questions on eco- 
nomic status. The United States, for example, 
asked questions about the ownership of real 
and personal property from 1850 to 1870 and 
income from 1940 and later. The main question 
on economic status for the majority of historical 
censuses, however, was a question on occupa- 
tion, first asked in 1820, and reported consis- 
tently since 1840. For the censuses from 1850 
on, the individual level occupation question 
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was open ended. For such questions there were 
also frequently age and sex thresholds, which 
limited the question to adults, males, house- 
hold heads, or individuals capable of owning 
property. 

Such changes in questions over time present 
difficulties for longitudinal analysis and require 
the researcher to evaluate the meaning of the 
changes. Different strategies are appropriate 
depending on whether the variable is the cen- 
tral focus of the analysis or a correlate or control 
for the analysis of another item of interest. 

For analyses where the object of study is 
changing over time, the shifting variables are 
themselves evidence of the historical phe- 
nomenon of interest. Asking a mid-nineteenth- 
century American about his income would have 
elicited a confused response, since most people 
did not work for a wage or salary. Similarly, ask- 
ing for the number of years of schooling com- 
pleted made little sense when schooling was 
intermittent and schools scarce. 

For researchers interested in using a measure 
of economic or social status as a correlate for 
changes over time, a number of techniques exist 
to develop a consistent measure. Most common 
is the scale measuring socioeconomic status, SES 
or SEI, which assigns a numeric value to an occu- 
pational code, and thus can be used to assign a 
common variable code for economic status for 
census results in which occupation has been 
recorded. See the discussion of the SEI variable 
in the IPUMS data, http://www.ipums.org/usa— 
action/variableDescription.do?mnemonic-SEI. 


7.3 Classification changes 


A related issue for longitudinal analysis of cen- 
sus responses involves changes in classifica- 
tion and coding schemes over time. Similar 
to the changes in the conceptualization of the 
question, classification changes may represent 
the origin or elimination of a phenomenon. The 
most complex coding scheme again involves the 
occupation variable. Computer programmers 
did not exist in the nineteenth century, nor 


do “intelligence office keepers” exist today.’ 
Thus the researcher requires a standardized 
coding scheme to accommodate the change rep- 
resented in the occupation codes themselves. 
The IPUMS project has confronted these issues 
and has developed several systems for longitu- 
dinal occupation coding. Additional guidance 
on the methodological issues involved is avail- 
able on their website, http://www.ipums.org. 

Similar problems plague coding schemes 
for geographic variables, e.g., place of birth 
or current address. As with the discussion 
above of the changes in city boundaries, the 
researcher must decide how to code and eval- 
uate addresses and geographic codes which 
change over time. For example, for a study of 
occupational and geographic mobility in the 
United States, a person born in 1905 in the 
Russian empire might have emigrated to the US 
from the Soviet Union of 1921, from a town 
which today is in Ukraine. 

Even variables with more limited coding 
schemes, such as race, are significantly more 
complex when used as longitudinal variables. 
Margo Anderson and Stephen E. Fienberg 
(2001, pp. 177-78) list the race categories from 
the US census from 1820 to the present. A per- 
son in this scheme could change his or her race 
from census to census because of changes in the 
coding conventions. 


7.4 Detail 


A final issue of comparability over time in 
longitudinal census data involves the level of 
detail reported on particular variables. Older 
censuses from the pre-computer age tended to 
employ simple response categories for a closed 
end question and employ few follow-up ques- 
tions to clarify a phenomenon. For questions on 
disability, for example, older censuses asked if 
a person was blind, deaf, “insane or idiotic.” 


‘Intelligence office keepers” ran employment agen- 
cies (Margo Anderson Conk, 1980). 


Presented by: https:/afrilibrary camino national census data to study change 23 


Current questions on disability include detailed 
information on the nature of the disability, and 
its impact in terms of functional impairment in 
the activities of daily living, and the duration 
of disability. 

Thus, though it is often possible to get a 
fairly lengthy and detailed time series of data 
by aggregating data from multiple censuses, the 
researcher must expect to put in significant 
effort in understanding any changes in ques- 
tion concept, coding, and reporting over time, 
and reconcile the differences among the vari- 
ous schemes before analysis. Not surprisingly, 
therefore, substantial methodological literature 
exists on solving these problems. In particular 
areas, for e.g., historical race classification, this 
literature has formed a substantive subfield in 
its own right as part of the history of race in 
modern societies. See, for example, Joel Perl- 
mann and Mary Waters, 2002; Melissa Nobles, 
2000; Clara Rodriguez, 2000; F. James Davis, 
1991; Anderson and Fienberg, 2001; Tukufu 
Zuberi, 2001. 


8 Uses of census data in research 


8.1 Trends 


The most common form of research using longi- 
tudinal census data is the time series or analysis 
of trends over time. The simplest of such analy- 
ses chart the changes over time of reported data 
either in tabular or graphic form with calcula- 
tions of percentage or absolute change between 
data points (Figure 2.1). Such reports of aggre- 
gate trends, particularly when graphed, almost 
naturally raise questions of the determinants, 
correlates or causes of the changes. The classic 
linear regression model became the fundamen- 
tal technique to explore such temporal change. 
Since most census data record rapid population 
growth in the nineteenth and twentieth cen- 
turies, such analysis of aggregates has focused 
on analyzing the rates and volume of change, 
whether stable or erratic. Such analysis makes 


it possible to identify the impact of histori- 
cal events external to the census on the data 
series. War, economic depression, or changes 
in a nation’s population policy on immigration, 
can then be identified in the series, and the rel- 
evant variables included in the model to test 
various hypotheses. 

See, for example, Walter Nugent (1981) 
for the analysis of the growth patterns dis- 
played in the decennial census. In the basic 
graph of population change for the US, the 
United Kingdom and France from 1790 to 2000 
(Figure 2.5), the dramatic numeric increase 
in the US population is the dominant les- 
son of the image. Regraphing on the rate of 
change by census year reveals a different pat- 
tern (Figure 2.6), namely two periods of sta- 
ble rates of growth connected by a declining 
trend between the two. Nugent used the pat- 
terns to explore the transition of the United 
States from an agricultural and “frontier rural” 
nation to an urban nation. Roderick Floud 
(1979, pp. 88-137) provides a related example 
in an analysis of British domestic exports from 
1820-1850. 

Trend patterns can also be displayed geo- 
graphically. The United States Census Bureau 
has, since the 1870s, reported a statistic on 
the “center of population” for the nation. 
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Figure 2.6 Decomposing growth trends, United 
States, 1790-2000 (reproduced with permission) 
Source: Walter Nugent (1981). Structures of American 
Social History. Bloomington: Indiana University Press 
http://images.questia.com/fif=b48586/b48586p0027.fpx& 
init=0.0,0.0,1.0,1&rect=0.0,0.25,0.5,0.75&wid=300&hei=205 
&vtrx=1&lng=en_US&enablePastMaxZoom=OFF &xFactor= 
100&page=qView, html&obj=uv,1.0&cmd=ZOOM_OUT 


Theoretically, the “center of population,” is the 
“place where an imaginary, flat, weightless and 
rigid map of the United States would balance 
perfectly” if everyone were of identical weight. 
http://www.census.gov/geo/www/cenpop/cb01 
cn66.html. See Figure 2.7, which graphs the 
mean center of population for each census 
since 1790. The starred data point moved 
westward from 1790 to 1880, then slowed 
as the frontier closed and the country urban- 
ized until 1940, then moved southwestward 
into Missouri by 2000 with the growth of 
the Sunbelt. The use of a geographic visu- 
alization of a temporal pattern of change 
provides a powerful analysis of longitudinal 
data. 


8.2. Denominator data 


Longitudinal census data are also frequently 
used as denominators in the analysis of other 
issues of interest, e.g., voting patterns, vital 
rates, or disease patterns. In such cases, the 
object of study requires data on the total pop- 
ulation from which the population of interest 


is drawn. For example, the analysis of elec- 
toral behavior and voter participation requires 
information on the overall population from 
which actual voters are drawn. Historians in the 
United States have made good use of local area 
census results from the nineteenth and twenti- 
eth centuries to chart and analyze the growth of 
mass democracy and electoral turnout, to iden- 
tify the determinants of voter mobilization and 
party success in particular local areas, and to 
develop the theory of critical elections and elec- 
toral realignment. For detail on the substantive 
results of this literature, see Theodore Rosenoff 
(2003). 

The need for systematic development of lon- 
gitudinal datasets for electoral analysis was rec- 
ognized in the 1960s. Political scientists who 
recognized the need for such datasets founded 
the Inter-University Consortium for Political 
and Social Research 1962 to coordinate the 
creation and preservation of electronic social 
science data files which were too large for any 
single researcher to assemble. Its first compi- 
lations were local area US presidential elec- 
tion returns from 1824 on, and candidate and 
constituency totals from 1788 on (ICPSR 1 and 
ICPSR 2). The third data file in the archive 
(ICPSR 3) compiled the US census denomi- 
nator data required for electoral analysis, i.e., 
“detailed county and state-level ecological or 
descriptive data for the United States” from the 
published decennial census volumes from 1790 
forward.? 


? Inter-university Consortium for Political and Social 
Research, United States Historical Election Returns, 
1824-1968 (ICPSR 1); Candidate Name and Con- 
stituency Totals, 1788-1990 (ICPSR 2); Historical, 
Demographic, Economic, and Social Data: the United 
States, 1790-1970 (ICPSR 3) [Computer files]. Ann 
Arbor, MI: Inter-university Consortium for Politi- 
cal and Social Research [producer and distributor], 
last updated 1999, 1995, 1992 respectively. (See 
www.icpsr.umich.edu.) 
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Figure 2.7 United States center of population, 1790-2000, http://www.census.gov/geo/www/cenpop/ 
cbgkn66.html, http://www.census.gov/geo/www/cenpop/meanctr.pdf. 


8.3 Record linkage 


Theoretically, censuses count the entire popula- 
tion ofa nation at regular intervals and record the 
name and address of individuals. For censuses 
no longer covered by confidentiality protec- 
tions, it should be possible to locate particu- 
lar individuals in multiple censuses, or with 
other contemporary data sources such as city 
directories, property records, or related data. 
Genealogists, of course, pursue such linkages 
in search of family lineages, and family histori- 
ans have also made good use of such records to 
construct family histories. Substantial method- 
ological resources have been developed to facil- 
itate such research. For example, it is possible 
to search US census manuscripts by the name 
of the person using the SOUNDEX indexes. 
Other historians have conducted such 
research with mixed results because of the 


weaknesses in using census data for such 
linkage. The problems are several. First, the 
research is extremely time-consuming. Though 
copies of the manuscript schedules are avail- 
able in research libraries, individuals need to 
be traced through multiple originally handwrit- 
ten data sources on microfilm. Second, match- 
ing individual cases can be problematic because 
of name spellings and name changes in differ- 
ent censuses. These are particular problems for 
searching for women who change name at mar- 
riage. Common names, e.g., John Smith, require 
additional information from related variables, 
e.g., an age or occupation link, to confirm a 
match. And, as many scholars discovered, cen- 
sus coverage is incomplete, and estimates of 
the level of incompleteness are quite imprecise 
before the 1940s. 

The result of a substantial number of studies 
which attempted to link individuals over time 
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and trace occupational mobility from fathers 
to sons was the realization that one of the 
major findings of the research was the evi- 
dence of the volatility of the population. Some 
40-80% of the cases could not be traced, and 
thus confirmed a different picture of historical 
mobility from the one initially hypothesized. 
See, for example, Stephan Thernstrom, 1964; 
Avery Guest, 1987; Peter Knights, 1991a; 1991b; 
Joseph Ferrie, 1996. 

Currently underway are new projects based 
upon more advanced methods to attempt to 
improve linkages between census dates. For 
further details, see Steven Ruggles (2003a). 
In the UK, the Office of National Statis- 
tics has built a prospective individual level 
longitudinal linked sample database (the 
Longitudinal Study or LS) from the decen- 
nial censuses of 1971 and later. There 
are confidentiality restrictions on these data 
(see the discussion above on confidential- 
ity issues in longitudinal census research). 
See http://www.celsius.lshtm.ac.uk/what.html 
for details on the files and their use. 


8.4 Sampling frames 


Since the development of probability sampling 
methods in the 1930s, the results of the com- 
plete count census have been used to inform 
the frames for sample surveys, both officially 
within governments and by private researchers. 
To serve this function, the accuracy of the 
complete count data must be assured. That is, 
the researchers creating the sample need esti- 
mates of overcount, undercount, and bias in 
the underlying census data collection. Modern 
central statistical agencies provide such esti- 
mates. Researchers interested in drawing sam- 
ples from data collected before the 1930s, either 
in the census itself or from related historical 
data such as property records or vital records, 
will generally have to decide how to approach 
the question of census accuracy for their par- 
ticular design. For example, for a discussion 
of this issue, in the context of the analysis of 


fertility decline in the United States, see Hacker 
(2003). 


9 The potential of longitudinal 
analysis of census data 


The literature cited in the discussions above 
provides evidence of the importance that schol- 
ars and officials within central statistical agen- 
cies have placed on the longitudinal analysis 
of census data. The availability in the last 10 
to 15 years of much more extensive electronic 
data files, and the potential of many more to 
come, has boosted the interest in the field, and 
is beginning to produce important results. Illus- 
trative of this work is James Gregory’s recent 
book, The Southern Diaspora (2005), which 
expands the understanding of migration out of 
the American South in the twentieth century. 
Studies of the large-scale Black migration 
from the agricultural areas of the American 
South to the cities of the North and West are 
well known, as are numerous case studies of 
migration, such as of whites from Appalachia 
to the industrial Midwest or the Okies to 
California. But until the availability of IPUMS 
microdata, it has not been possible to trace sys- 
tematically the massive out migration from the 
American South. Twenty-eight-million people 
(Gregory, 2005, pp. 19, 330) left the American 
South for the North and West in the twentieth 
century, dramatically changing the economy, 
culture, and politics of the region they left 
and the regions they settled. The newly avail- 
able longitudinal microdata make it possible to 
identify the occupational and educational char- 
acteristics of the migrants, as well as explore 
family structure, return migration, and the 
impact of the migrants on the receiving com- 
munities. Among his findings are that the white 
southern diaspora is considerably larger than 
the black diaspora, but less visible since south- 
ern whites tended to migrate to smaller cities 
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and be less visible migrants in their new desti- 
nations. 

Patricia Hall and Steve Ruggles (2004) have 
produced equally provocative results from 
their analysis of longitudinal IPUMS data. 
They return to Frederick Jackson Turner’s 
famous analysis of the significance of the fron- 
tier in American history, exploring internal 
migration from 1850 to 2000, confirming much 
of Turner’s argument, while noting the impact 
of a new wave of suburban migration in the 
second half of the twentieth century. See also 
Steven Ruggles (2003b) on longitudinal analy- 
sis of US family structure. 


10 Historical context and the 
politics of numbers 


As noted at the outset, census data are col- 
lected by the national state and thus have the 
authority of the state and the interests of the 
state embedded in them. In all states, whether 
democratic, authoritarian, or imperial, a pol- 
itics of numbers frames the data collection, 
reporting, and preservation process, which in 
turn affects potential historical research from 
the data years and centuries later. A researcher 
can trace the contemporary political contro- 
versies of the day in the methods and reports 
published at the time of the census, and is 
well advised to spend some time understand- 
ing the issues in order to judge the quality and 
capacities of the data for longitudinal analy- 
sis. One might expect that the censuses from 
culturally similar, primarily English-speaking 
societies of Britain, Canada and the US, would 
be quite similar, and in many ways they are. 
These nations all have long traditions of data 
collection, quite similar enumeration methods, 
and practices of publishing extensive census 
results. Nevertheless, there are important dif- 
ferences in emphasis, questions, and reporting 
styles for the data, which derive from the differ- 


ent uses and political traditions of their national 
states. 

The American census is the oldest compi- 
lation and the first census to be used for the 
purposes of legislative apportionment. Accord- 
ingly, the questions asked and the _ tabula- 
tions prepared have always been closely related 
to the political and demographic controver- 
sies of the society. In the nineteenth cen- 
tury, these were issues of race, population 
growth and change, and migration (Margo 
Anderson, 1988; Anderson and Fienberg, 2001). 
The US census has recorded the racial charac- 
teristics of the total population directly since 
1820, and of the white population from 1790 
to 1810. At the height of the early twen- 
tieth century immigration surge, the census 
asked 10 questions on immigrant status, mother 
tongue and English-speaking ability, and cit- 
izenship status. The United Kingdom census, 
by comparison, did not ask a question on 
the “ethnic group” of the respondent until 
1991. 

By contrast, the situation and evolution of 
social classes underpinned much of the ques- 
tions and analysis in the UK census in the nine- 
teenth and early twentieth centuries (Edward 
Higgs, 1996; Simon Szreter, 1996). In the first 
urban nation, and the home of the industrial 
revolution, precise information on respondents’ 
occupations and occupational classifications 
were developed to provide denominator data 
to evaluate the fertility, mortality and overall 
health of the working classes and middle 
classes. 

In the Canadian census one sees a third 
interest, namely in the language background 
of a population that was created when English 
and French colonies joined to form a national 
state (Bruce Curtis 2001). At the census 
of Canada in 1871 (Figures 2.8 and 2.9, 
http://www.collectionscanada.ca/genealogy/02 
2-911-e.html,) questionnaires were prepared 
in French and English. In 2001, the Canadian 
census had questions on the first language 
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learned in childhood, languages understood 
and spoken at home, knowledge of official 
and non-official languages in the various 
regions of Canada, and on language used 
at work. 

A related issue in the use of census data for 
longitudinal analysis involves recognizing that 
the published data may be tainted in some form. 
For example, it is generally recognized that the 
reports of insanity by race for the US census 
of 1840 are wrong and there have been major 
controversies about the quality of the popula- 
tion count in the American South at the 1870 
census, so much so that the Census Bureau “cor- 
rected” the totals in later reprints. For details, 
see Patricia Cline Cohen, 1982; Anderson, 1988; 
Hacker, 2003. 
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Figure 2.8 1871 Canadian census form, English 


In other words, if a researcher intends to cre- 
ate his or her own longitudinal data file by 
compiling reported data from past censuses, 
or by retabulating the original returns, he or 
she must investigate the data collection pro- 
cess at each census used. Alternatively, the 
researcher can draw upon electronic or pub- 
lished resources which disseminate longitudi- 
nal data, e.g., use the data files from IPUMS 
or HSUS, where the researcher can be assured 
that the data compilers have addressed the 
methodological issues involved in preparing 
the longitudinal file. Even then, the researcher 
should thoroughly review the technical reports 
accompanying the data to make sure that 
any unforeseen issues do not plague the data 
analysis. 
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Source: Canadian Geneology Centre, Library and Archives Canada, Census, http://www.collectionscanada.ca/the-public/005- 
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Figure 2.9 Canadian census form, French 
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11 A final note 


This essay has leaned heavily on examples from 
American census practices to illustrate method- 
ological issues. It has done so because there is 
a particularly rich tradition of unbroken cen- 
sus data collection in the United States, and 
thus not surprisingly a rich tradition of research 
using historical census data, both aggregate 
and micro level, as well as data development 
projects which have confronted in systematic 
fashion the technical issues of historical census 
research. 

Similar data traditions and research exist in 
Scandinavian nations, though the “census” is 
a different type of data collection. The United 
Kingdom (and its former colonies) and Canada 


have also pioneered in historical census use, 
though their relatively more restrictive rules 
on access to microdata and interests in demo- 
graphic and economic research in the pre- 
census era have led to the development of 
different types of data sources, particularly 
parish registers and trade records of economic 
activity. 

Nations without stable governing regimes 
over long periods, or nations with changing 
political boundaries, present different chal- 
lenges for compiling and using longitudinal 
census data. National regime change tends to 
disrupt or change the census data collection 
and reporting process, making the compila- 
tion of aggregate data series more difficult, 
and the preservation of microdata precarious. 
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Demographic historians, for example, have 
recently begun to find and use census data col- 
lected during the Soviet regime.*? Researchers 
are directed to projects which compile inter- 
national historical statistics generally (e.g., 
Mitchell, 2003), and to the United Nations Sta- 
tistical Office.* 

Broadly speaking, the potential for further 
development of longitudinal census data, both 
aggregate and microdata files, is large. As 
noted above, the computer revolution has 
made it feasible to retrieve and retrofit data 
preserved on paper or microfilm to elec- 
tronic formats. That capacity, in turn, has 
made it efficient for researchers to develop 
large-scale historical and comparative census 
projects, like the North American Population 
Project, http://www.nappdata.org/napp/, and 
IPUMS International. Thus, though it may seem 
counterintuitive, one can expect an increas- 
ing flow of data from the pre-electronic past 
to become available in the years ahead, and 
thus researchers should be encouraged to think 
about how their analyses could be enhanced if 
existing, but inaccessible, historical census data 
were added to the resources of the social sci- 
ences. Envisioning more information from cen- 
suses in the past (my apologies to Edward Tufte) 
is the first step in producing the data. 


3See for example, Bakhtior Islamov, “Cen- 
tral Asian Population in Historical Perspec- 
tives, “http://www. ier.hit-u.ac.jp/COE/Japanese/ 
Newsletter/No.14.english/Islamov.htm, accessed 
11/22/2006; Elena Glavatskaya, “Ethnic 
categories in the 1926 Russian census,” 


http://www.ddb.umu.se/cbs/workshop2006/abstract 
-htm #glavatskaya; David Anderson and Konstantin 
Klokov, “The 1926 Siberian Polar Census and 
Contemporary Indigenous Land Rights in Western 
Siberia,” http://www.abdn.ac.uk/anthropology/ 
polarcensuslandrights.shtml. 

“United States Statistics Division, Demographic and 
Social Statistics, Population and Housing Censuses, 
http://unstats.un.org/unsd/demographic/sources/ 
census/default.aspx 
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| Chapter 3 j 


Repeated cross-sectional research: the 
general social surveys 
Tom W. Smith 


1 Introduction 


The National Data Program for the Social 
Sciences (NDPSS) is a social indicators and data 
diffusion program. Its basic purposes are (1) 
to gather and disseminate data on contempo- 
rary American society in order to (a) monitor 
and explain change and stability in attitudes, 
behaviors, and attributes and (b) examine the 
structure and functioning of society in gen- 
eral as well as the role played by various sub- 
groups; (2) to compare the United States to other 
societies in order to (a) place American soci- 
ety in comparative perspective and (b) develop 
cross-national models of human society; and 
(3) to make high-quality data easily accessible 
to scholars, students, policymakers, and others 
with minimal cost and waiting. 

These purposes are accomplished by the 
regular collection and distribution of the 
National Opinion Research Center (NORC) Gen- 
eral Social Survey (GSS) and its allied sur- 
veys in the International Social Survey Program 
(ISSP). Both the GSS and the ISSP surveys have 
been efficiently collected, widely distributed, 
and extensively analyzed by social scientists 
around the world. 


2 Organization 


The NDPSS is directed by James A. Davis, 
Tom W. Smith, and Peter V. Marsden. From 
1972 to 1982 the GSS was advised by a Board 
of Advisors and starting in 1978 a Board of 
Methodological Advisors. In 1983 at the behest 
of the National Science Foundation (NSF) these 
groups were combined to form a new Board 
of Overseers. The Board provides guidance to 
the GSS, forms linkages to the various research 
communities, spearheads the development of 
topical modules, approves the content of each 
survey, and evaluates the work of the project. 


3 Data collection: 1972-2004 


Since 1972 the GSS has conducted 25 inde- 
pendent, cross-sectional surveys of the adult 
household population of the United States and 
in 1982 and 1987 carried out oversamples of 
Black Americans. As Table 3.1 details, there 
have been a total of 45,803 respondents inter- 
viewed from the cross-sections, plus 707 Black 
respondents from the two oversamples. 

While the population sampled has remained 
constant, transitional sample designs have been 
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employed three times: in 1975-76 to calibrate 
the shift from the original block-quota sample to 
the full-probability design utilized since 1977, 
in 1983 when the 1970 NORC sample frame was 
compared with the new NORC sample frame 
based on the 1980 census, and in 1993 when 
the 1980 NORC sample frame and the new 1990 
NORC sample frame based on the 1990 census 
were used. The 1990 sample frame was utilized 
through 2002. A new sample frame based on 
the 2000 census was introduced in 2004 (Davis, 
Smith, and Marsden, 2005). 

By using a strict, full-probability sample 
design, rigorous field efforts, and extensive 
quality control, the GSS produces a_ high- 
quality, representative sample of the adult pop- 
ulation of the United States. The GSS response 
rate has generally been in the upper 70s, with 
a high in 1993 of 82.4%. However, the GSS 
response rate has declined in recent years to just 
over 70%. This rate is higher than that achieved 
by other major social science surveys and 35-— 
45 percentage points higher than the industry 
average (Council for Marketing and Opinion 
Research, 1998; Krosnick, Holbrook, and Pfent, 
2003). 

In order to accommodate more questions, 
the GSS employs a questionnaire design under 
which most questions are asked of only a sub- 
set of respondents. From 1972 to 1987, that 
was accomplished with a rotation design under 
which questions appeared on two out of every 
three years. In 1988, the GSS switched from an 
across-survey rotation design to a split-ballot 
design. Under this design questions are asked 
every year, but only on two of three subsam- 
ples. Over a three-year period, questions that 
would have appeared on two surveys with a 
total of 3000 respondents (2 x 1500) under 
the old rotation design, now appear on two- 
thirds subsamples on all three surveys for a 
total of 3000 respondents (3 x 1000). This shift 
eliminated the problem of periodic gaps in the 
annual time series and facilitated time-series 
analysis (Davis, Smith, and Marsden, 2005). 


Starting in 1994, GSS switched to a biennial, 
double-sample design. In effect the 1994 GSS 
was two surveys in one with an A sample of 
1500 representing the “regular” 1994 GSS and 
a B sample of 1500 representing the “missing” 
1995 GSS. The double-sample design literally 
combines two separate GSSs with distinct top- 
ical and ISSP modules into one field operation 
(and similarly for the subsequent pairs of years). 


3.1 Components 


The GSS is divided into five components: 
(1) the replicating core, (2) topical modules, (3) 
cross-national modules, (4) experiments, and 
(5) reinterviews and follow-up studies. In recent 
years the replicating core has taken up half of 
the interviewing time and the topical, cross- 
national, and supplemental modules take up 
the other half. Experiments are done within 
either the core or the modules, and reinterviews 
and follow-up studies involve additional inter- 
viewing after the GSS has been completed. 


Replicating core 
The replicating core consists of questions that 
regularly appear in surveys either as full- 
coverage items or on subsamples. The content 
of the core is periodically reviewed by the 
PIs and Board of Overseers to insure that the 
content remains relevant and up-to-date. Cur- 
rently, the replicating core makes up about half 
of the overall length of the GSS and consists 
of about one-third demographic questions and 
two-thirds attitudes and behaviors. The repli- 
cating core forms the basis for the trend analysis 
and pooling of cases for subgroup analysis. 
The GSS is intentionally wide-ranging in its 
contents, with 4624 variables in the 1972-2004 
cumulative file. One needs to peruse the 
GSS Cumulative Codebook (Davis, Smith, 
and Marsden, 2005) or the online version 
at http://www. icpsr.umich.edu/cgi-bin/bob/ 
newark?study=4295 to fully appreciate the 
scope of the GSS. 
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The GSS is different from most surveys in 
the wide variety of demographics included 
and the detail in which they are asked and 
coded. In addition to covering the extensive 
background variables on the respondent’s cur- 
rent status, the GSS has extensive informa- 
tion on the respondent’s family of origin and 
parental characteristics. Among the family of 
origin items are questions on the intactness 
of families (and reasons for “broken homes”), 
number of siblings, religion, region, and com- 
munity type. Additionally, parental variables 
include mother’s and father’s education, church 
attendance, occupation, and industry. There are 
also many questions about spouses. 

In addition, measures are usually very 
detailed. For example, occupation and industry 
use both the census three-digit classification 
codes and the four-digit International Standard 
Classification of Occupations, two measures 
of occupational prestige, education codes both 
number of years in school and highest degree 
obtained, three community type measures are 
included, and up to three ethnic and racial 
identities are coded. 

Besides the demographics, the core items 
cover a variety of behaviors, personal evalu- 
ations, and attitudes about central social and 
political issues from death (e.g., capital punish- 
ment, suicide, euthanasia) to taxes (as a redistri- 
bution measure, paying too much?). Among the 
many topics covered are abortion, civil liberties, 
confidence in institutions, crime and punish- 
ment, government-spending priorities, poverty 
and inequality, intergroup relations, religion, 
and women’s rights. 


Topical modules 

Topical modules (special sections on a partic- 
ular theme) first appeared in 1977 and have 
been an annual feature since 1984. The topical 
modules are designed to facilitate both inno- 
vation and greater depth. They introduce new 
topics not previously investigated by the GSS 
and cover existing topics in greater detail with 


more fully-specified models. The original con- 
cept for a module may come from the prin- 
cipal investigators, the Board of Overseers, or 
other interested scholars. The themes covered 
in major modules are listed in Table 3.1. 


Cross-national modules 

The GSS has spurred cross-national research by 
inspiring other nations to develop similar data- 
collection programs (e.g., the ALLBUS (Ger- 
many), British Social Attitudes, National Social 
Science Survey (Australia), Taiwan Social 
Change Study, Polish General Social Survey, 
Japanese General Social Survey, Korean Gen- 
eral Social Survey, and Chinese General Social 
Survey (Smith, Koch, Park, and Kim, 2006b) 
and by organizing these and other programs into 
the ISSP. (See www. issp.org) 

The fundamental goal of ISSP is to study 
important social and political processes in com- 
parative perspective. In addition, by replicating 
earlier modules, ISSP not only has a cross- 
national perspective, but also an over-time per- 
spective. With ISSP one can both compare 
nations and test whether similar social-science 
models operate across societies, and also see if 
there are similar international trends and 
whether parallel models of societal change 
operate across nations. Thus, by combining 
an across-time with a cross-national design, 
ISSP incorporates two powerful perspectives 
for studying societies. 

ISSP evolved from a bilateral collaboration 
between the Allgemeinen Bevolkerungsumfra- 
gen der Sozialwissenschaften (ALLBUS) of the 
Zentrum fuer Umfragen, Methoden, und Analy- 
sen (ZUMA) in Mannheim, West Germany and 
the GSS of NORC, University of Chicago. In 
1982 and 1984 ZUMA and NORC devoted a 
small segment of the ALLBUS and GSS to a 
common set of questions on job values, impor- 
tant areas of life, abortion, feminism, class dif- 
ferences, equality, and the welfare state. 

Meanwhile, in late 1983 the National Centre 
for Social Research, then known as Social and 


Table 3.1 Design features of the GSS 1972-2004 


Year Sample Sample Response Item Experimental Reinterviews Modules topical International 
size type rate% rotation forms 
1972 1613 BQ - None None Two waves None None 
1973 1504 BQ - AS Two forms Three waves None None 
1974 1484 BQ - AS Two forms Three waves None None 
1975 1490 V2BQ - AS Split sample None None None 
V/2Ep 75.6 
1976 1499 V2BQ _ AS Two forms + None None None 
V/2Ep 75.1 split sample 
1977 1530 FP 76.5 AS None None Race, None 
abortion, 
feminism 
1978 1532 FP 73.5 AS Two waves None None None 
1980 1468 FP 75.9 AS Three forms None None None 
1982 1506 FP 77.5 AS Two forms None Military ZUMA 
1982B 354 FP 71.7 AS Two forms None Military ZUMA 
1983 1599 70FP 79.4 AS Two forms + None None ZUMA 
80FP split sample 
1984 1473 FP 78.6 AS Three forms None None ZUMA 
1985 1534 FP 78.7 AS Two forms None Social ISSP 


networks 
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1986 1470 FP 75.6 AS Two forms None Welfare ISSP 
+ vignettes 
1987 1466 FP 75.4 AS Three forms Political Political ISSP 
tolerance participation 
1987B 353 FP 79.9 AS Three forms Political Political ISSP 
tolerance participation 
1988 1481 FP 77.3 SB Two forms Cognitive Religion ISSP 
1989 1537 FP 77.6 SB Two forms? Methods/ Occupational ISSP 
Health> prestige 
1990 1372 FP 73.6 SB Two forms Health Intergroup ISSP 
relations 
1991 1517 FP 77.8 SB Two forms ISSP Work ISSP 
1992 organizations 
1993 1606 FP 82.4 SB Two forms None Culture ISSP 
1994 2992 FP 77.8 DSB Two forms None Family mobility ISSP 
Multiculturalism 
1996 2904 FP 76.1 DSB Two forms + Parents of Mental health ISSP 
vignettes Students Emotions 
Gender 
Market exchange 
1998 2832 FP 75.6 DSB Two forms + Health use Religion ISSP 
vignettes knowledge Job experiences 
Health and 
mental health 
Medical ethics 
Culture 
Inter-racial 
friendships 
(Continued) 
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Table 3.1 (Continued) 


Year Sample 


size 


Sample 
type 


Item 
rotation 


Response 
rate% 


Experimental 
forms 


Reinterviews 


Modules topical 


International 


2000 2817 FP 


2002 2765 FP 


2004 2817 FP 


70.0 DSB 


70.1 DSB 


70.4 DSB 


Two forms 


Two forms 


Two forms 


Internet use 


Worker health 


Voluntary 
associations 


Religion 
Computers 
Multi-ethnic 
Health status 
Freedom 


Altruism 
Internet 
Intergroup 
relations 
Quality of work 
Worker pay 
Adulthood 
Doctors 

Mental health 
The arts 


Altruism 

Internet 

Negative events 
Genes/Environment 
Religious change 
Guns 

Social networks/ 
Voluntary groups 
Alcohol use 
Workplace stress/ 
Violence 


ISSP 


ISSP 


ISSP 


@ For the OCCUPATIONAL PRESTIGE module 12 subsamples were used. 
> The 1990 health reinterview used 1989 and 1990 GSS respondents. 


B= Black oversample 
BQ=Block quota sampling 

FP =Full probability sampling 
AS = Across-survey rotation 
SB = Split-ballot rotation 


DSB = Double sample, split-ballot rotation 
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Community Planning Research (SCPR), which 
was starting the British Social Attitudes Survey, 
secured funds for meetings for international 
collaboration. Representatives from ZUMA, 
NORC, SCPR, and the Research School of Social 
Sciences, Australian National University, orga- 
nized ISSP in 1984 and agreed to (1) jointly 
develop topical modules covering important 
social science topics, (2) field the modules as 
supplements to the regular national surveys 
(or a special survey if necessary), (3) include 
an extensive common core of background vari- 
ables, and (4) make the data available to the 
social-science community as soon as possible. 

Each research organization funds all of its 
own costs. There are no central funds. Coor- 
dination is supplied by one nation serving as 
the secretariat. The United States served as the 
secretariat from 1997 to 2003. 

Since 1984, ISSP has grown to 40 nations, 
the founding four—Germany, the United States, 
Great Britain, and Australia—plus Austria, 
Brazil, Bulgaria, Canada, Chile, Croatia, Cyprus, 
the Czech Republic, Denmark, the Dominican 
Republic, Finland, Flanders, France, Hungary, 
Ireland, Israel, Japan, Korea, Latvia, Mexico, 
the Netherlands, New Zealand, Norway, 
the Philippines, Poland, Portugal, Russia, 
Slovakia, Slovenia, South Africa, Spain, 
Sweden, Switzerland, Taiwan, Uruguay, and 
Venezuela. In addition, East Germany was 
added to the German sample upon reunifica- 
tion. Past members not currently active include 
Bangladesh and Italy. In addition, a number 
of non-members have replicated one or more 
ISSP modules. This includes Argentina (Buenos 
Aires metro area only), Lithuania, and Singa- 
pore. 

ISSP maintains high standards of survey 
research. Each nation uses full-probability sam- 
pling, carefully monitors all phases of the data 
collection, and cleans and validates the data. 
The ISSP’s Central Archive further checks all 
data archived by the member nations. Countries 
applying for membership answer a series of 


standard questions about methodology and sur- 
vey procedures. Only once the secretariat has 
received satisfactory responses to all questions 
is a country’s membership application consid- 
ered by ISSP. Each country reports to ISSP its 
methods and various technical details such as 
its response rate. To check on the representa- 
tiveness of the sample, each country compares 
distributions on key demographics from ISSP 
surveys to the best data sources in their respec- 
tive countries. 

ISSP modules have covered the following 
topics: (1) Role of Government—1985, 1990, 
1996, 2006,'(2)Social Support and Networks 
(1986 and 2001), (3) Social Inequality (1987, 
1992, 1999), (4) Gender, Family, and Work 
(1988, 1994, 2002), (5) Work Orientation (1989, 
1997, 2005), (6) Religion (1991, 1998, 2008), 
(7) Environment (1993, 2000), (8) National Iden- 
tity (1995, 2003), (9) Citizenship (2004), and 
(10) Leisure Time (2007). 


Experiments 
Experimental forms have always been a regu- 
lar part of the GSS. The GSS has used split 
samples in 1973, 1974, 1976, 1978, 1980 and 
1982-2004. They have been an integral part of 
the GSS’s program of methodological research. 
Dozens of experiments have examined differ- 
ences in question wording, response categories, 
and context (Davis, Smith, and Marsden, 2005). 
Experiments are carried out as part of the 
replicating core, topical modules, and supple- 
ments. In some years the experiments consist 
of additional questions not regularly appear- 
ing on the GSS, such as the interracial friend- 
ships experiments in 1998 and the wording and 
response-order experiments on genetic screen- 
ing items in 1991 and 1996. Most of the time, 
however, the experiments compare a variant 
wording or order with the standard GSS word- 
ing and/or order being the control. Examples 


1ISSP replication modules repeat two-thirds of their 
content from earlier rounds. 
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are the experiments on measuring race and eth- 
nicity in 1996 and 2000. 

In addition, there have often been exper- 
iments within topical modules. For exam- 
ple, experiments were conducted as part of 
the 1986 factorial-vignette study of welfare, 
the occupational-prestige study in 1989, the 
1989 intergroup-relations module with word- 
ing experiments to test the impact of class 
versus racial references, the 1994 multicul- 
turalism module with various formulations of 
affirmative action policies, the 1996 mental- 
health module with 18 different versions of five 
basic vignettes (90 versions in all) to examine 
stigmatization of troubled individuals, the 1996 
gender module, the 1998 factorial vignettes 
on terminal-care decisions, the 2000 health- 
status and computer-use modules, and the 2002 
vignette studies of the mental health of children 
and physician—patient communications. 


Reinterviews and follow-ups 

GSS respondents have been reinterviewed both 
as part of methodological and substantive stud- 
ies. The methodological uses have included 
studies of reliability, cognition, and wording 
and context. In 1972, 1973, 1974, and 1978, 
test/retest studies of item stability and relia- 
bility were conducted (Smith and Stephenson, 
1979; Alwin and Krosnick, 1989). In 1988, cog- 
nitive scientists at the University of Chicago 
expanded the normal GSS validation effort and 
added recall questions about the timing and 
content of the initial interview. Reinterview 
reports were then validated against the known 
information on date and content and models of 
memory were developed to explain the discrep- 
ancies. Telescoping or forward biasing in the 
reporting of past events was documented and 
this was related to the placing of upper limits 
on time estimates and a tendency to round to 
the next lower or complete time period, e.g., 
two weeks, one month (Huttenlocher, Hedges, 
and Bradburn, 1990). 


In 1990, NORC and the University of 
Chicago supported a seminar on survey- 
research methods to study wording and context 
effects. About a third of the 1989 GSS cases 
were recontacted by phone. Comparisons were 
made between standard and variant questions 
across subsamples on the reinterview, between 
standard questions on the GSS and the reinter- 
view, and between standard questions on the 
GSS and variant questions on the reinterviews. 
As in the earlier GSS reinterview studies, a 
notable degree of instability in responses was 
found (Junn and Nie, 1990; Ramirez, 1990). As 
expected, attitudinal items showed more vari- 
ation than demographics. The less educated, 
those with no earned income, and older respon- 
dents showed the greatest differences in their 
responses. 

The GSS has also served as a list sample for 
several substantive studies. GSS respondents 
are a representative sample of adults living in 
households and can be used as a list or sample 
frame for a follow-up study. While one must 
naturally adjust for any bias from panel mor- 
tality, the GSS offers an excellent frame for a 
follow-up study. First of all, since respondent 
names, addresses, and telephone numbers are 
known, GSS respondents are relatively easy to 
recontact. Second, a rich amount of information 
is known about respondents. This information 
can be used in several ways. For unchanging 
attributes like year of birth, income during the 
past year, or nationality, one can link the data 
obtained on the GSS to the follow-up study and 
thereby free up time on the follow-up study. 
Third, one can use any GSS variables to study 
panel mortality and, if necessary, adjust for 
panel mortality bias. 

There have been seven substantive reinter- 
views of GSS respondents. The first in 1987 
contained questions on political tolerance and 
Cloninger’s Tridimensional Personality Scale. 
The second reinterview study was the 1990 
National Survey of Functional Health Status. 
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Respondents from the 1989/1990 GSS, plus an 
additional sample of people 65+ from these 
households, were contacted in late 1990 and 
early 1991. In 1994—95, respondents were rein- 
terviewed again. In the third reinterview study 
in 1992 respondents to the 1991 GSS were rein- 
terviewed in order to collect information for 
the ISSP social inequality module and study 
changes in negative life events over time. The 
fourth reinterview in 1997 contacted parents 
of students in grades 1-8 from the 1996 GSS. 
The fifth on the 1998 GSS did reinterviews 
on knowledge about and attitudes towards the 
role of behavioral interventions and _ social- 
science treatments in health care. The sixth 
reinterview in 2001 was an extension of the 
2000 topical module on computers and the 
Internet. The latest reinterview is of employed 
people on the 2002 GSS. In 2002-2003 they 
were reinterviewed about work-related, health 
issues. 

The GSS has also served as the source for 
six special follow-up studies, most involv- 
ing hypernetwork sampling. First, in 1991 a 
record of the employer of respondents and 
spouses was collected. These employers were 
contacted as part of a study of work organiza- 
tions, the National Organizations Study (NOS). 
This information can be analyzed in its own 
right as well as linked back to the attitudes of 
the original GSS respondents. Second, in 1994 
a random sibling was selected for an interview 
in order to study social mobility within sibsets. 
Third, in 1998 and 2000 a sample of respon- 
dents’ congregations was created. In 1998 a 
follow-up survey of these congregations was 
fielded. For 2000 there were follow-up surveys 
both of congregations and of people attend- 
ing services of these congregations. Fifth, as 
with the 1991 NOS, on the 2002 GSS informa- 
tion was collected on respondents’ employers 
(spouses’ employers were not covered in 2002). 
Finally, in 2006 the National Voluntary Asso- 
ciations Study contacted groups that 2004 GSS 
respondents belonged to. 


4 Publications by the user 
community 


As of 2005, the GSS was aware of over 
12,000 research uses of the GSS in articles, 
books, dissertations, etc. Most users (82%) 
have been academics with college affiliations. 
Other users include scholars at research cen- 
ters, foundations, and related organizations 
(12%); government researchers (1%); and oth- 
ers and unknown (5%). Among the academics 
sociologists predominate (56%), followed by 
political scientists (15%), law and criminal 
justice researchers (6%), psychologists (5%), 
economists (4%), physicians and other health 
professionals (5%), statisticians (3%), business 
management professors (2%), other social sci- 
entists (e.g. anthropologists and geographers) 
(2%), and non-social scientists and miscella- 
neous (2%). 

Moreover, with the exception of the census 
and its Current Population Survey, the GSS is 
the most frequently used dataset in the top soci- 
ology journals.” As Table 3.2 shows, in the top 
sociology journals the GSS has been used in 145 


Table 3.2 Most frequently used datasets in leading 
sociology journals, 1991-2003 


Census/CPS 180 
GSS 145 
National Longitudinal Survey of Youth 43 
Panel Survey of Income Dynamics 36 
National Survey of Families and Households 28 
National Educational Longitudinal Survey 18 
Adolescent Health 14 
High School and Beyond 13 
National Election Studies 13 
Occupational Change in a Generation II 10 


2we used the American Sociological Review, the 
American Journal of Sociology, and Social Forces. 
They are the consensus choice as the top general soci- 
ological journals (Allen, 1990; Kamo, 1996; Presser, 
1984). 
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articles—more often than the total of the next 
five most frequently used datasets combined. 


5 Teaching and other uses 


The GSS is widely used in teaching at 
the undergraduate and graduate levels. About 
250,000 students annually take courses that uti- 
lize the GSS and nearly 400 college textbooks 
use GSS data. 

The GSS has also been used outside the aca- 
demic community by the government, media, 
non-profits, and business community. Taking 
the federal government as an example, the GSS 
is regularly used by (1) the Congressional Ref- 
erence Service of the Library of Congress, (2) 
the Science and Engineering Indicators series 
of NSF, (3) the Sourcebook of Criminal Justice 
Statistics of the Bureau of Justice Statistics, and 
(4) Statistical Abstract of the United States of 
the Bureau of the Census. GSS data have been 
cited in 20 briefs to the US Supreme Court. 


6 Contributions to knowledge 


Because of the wide-ranging content and exten- 
sive level of usage of the GSS, it is effectively 
impossible to describe all of the results from 
the thousands of publications covering dozens 
of fields. Instead GSS’s contributions to basic 
knowledge will be considered regarding (1) how 
key design features of the GSS have promoted 
social-science research, (2) the study of soci- 
etal change, (3) cross-national research, and (4) 
methodological research. 


6.1 Design features of the GSS and research 


Several key aspects of the GSS study 
design greatly facilitate research opportunities. 
These include: (1) replication, (2) breadth of 
substantive content, (3) extensive and detailed 
demographics, (4) providing a standard of com- 
parison for other surveys, and (5) depth and 
innovation in the topical modules. 


Replication is the most important design 
feature of the GSS. Replication is necessary for 
two crucial research goals of the GSS: (1) the 
study of societal change and (2) the study of 
subgroups. A sample of GSS research publica- 
tions since 1995 shows that 60% of all GSS 
usages make use of the replication feature by 
utilizing two or more years of the GSS. 

The GSS core is based on the simple princi- 
ples that (1) the way to measure change is not to 
change the measure (Smith, 2005b), and (2) the 
optimal design for aggregating cases is a repli- 
cating cross-section. Besides replication within 
the core to study societal change and subgroups, 
the GSS employs replication in several other 
ways. 

First, many of the variables used on the GSS 
were adopted from baseline surveys with obser- 
vations going back as far as the 1930s and 1940s. 
As aresult, hundreds of GSS trends extend back 
before the inception of the GSS in 1972 (Smith, 
1990). 

Second, several topical modules have been 
designed to replicate seminal studies. For 
example, the 1987 module on sociopolitical 
participation replicated key segments of the 
1967 Verba—Nie study of political participation 
(Verba and Nie, 1972); the 1989 occupational 
prestige module updated the NORC prestige 
studies of 1963-1965 (Nakao and Treas, 1994); 
and the 1996 Mental Health module drew on 
Starr’s seminal study from the early 1950s 
(Phelan, et al., 2000). Even when not primarily 
a replication, other modules, such as the mod- 
ules in 1990 on intergroup relations, in 1991 
on work organizations, in 1994 on multicultur- 
alism, in 2000 on health functioning, and in 
2002 and 2004 on empathy and altruism, have 
adopted key scales from earlier studies. 

Third, there is a social trends component in 
ISSP. Cross-national modules are periodically 
repeated to measure societal change in a com- 
parative perspective. 

Finally, experiments have been replicated 
over time. 
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Replication is first and foremost used to 
study societal change. An analysis of recent 
publications (from 1995 on) shows that 39% of 
all research examines trends. Examples of this 
body of research are presented in the section on 
research findings below. 

Replication is also essential for the pool- 
ing of cases to study cultural subgroups and 
understand the great complexity and diversity 
of American society. For example, the 1972- 
2004 GSSs have 973 Jews, 2850 holders of 
graduate-level degrees (and 850 with 20+ years 
of schooling), 498 French Canadians, 732 reg- 
istered nurses, and 72 economists. The GSS 
has been used not only to study all of the 
major social groups (e.g., men and women, 
Blacks and Whites, the employed, etc.), but 
also to examine much smaller groups and 
combinations of groups. The GSS has been 
used to focus on and examine an incredi- 
bly wide range of groups, including: Amer- 
ican Indians (Hoffman, 1995), art museum 
attendees (DiMaggio, 1996), engineers and sci- 
entists (Smith, 2000; Weaver and Trankina, 
1996), farmers (Drury and Tweeten, 1997), part- 
time workers (Kalleberg, 1995), schoolteachers 
(Lindsey, 1997), the self-employed and busi- 
ness owners (Kingston and Fries, 1994), and 
veterans (Feigelman, 1994; Lawrence and Kane, 
1995). And among the combination of groups 
investigated are Black Catholics (Feigelman, 
Gorman, and Varacalli, 1991), the divorced 
elderly (Hammond, 1991), older rural resi- 
dents (Peterson and Maiden, 1993), and self- 
employed women (Greene, 1993; McCrary, 
1994). Moreover, in a number of instances sub- 
groups were pooled into several time periods 
so that both trend and subgroup analysis was 
possible. For example, among Hispanics (Hunt, 
1999), Jews (Greeley and Hout, 1999; Smith, 
2005a), and schoolteachers (Walker, 1997). 

A second key design feature of the GSS is 
its wide-ranging content. The cumulative 1972— 
2004 GSS dataset has 4624 variables and typi- 
cally 850-1000 variables appear on each recent 


GSS. As a result, the GSS covers a wide range 
of topics and as the Office of Inspector Gen- 
eral of NSF has noted, attracts use from “scien- 
tists in almost every subfield of sociology and 
in numerous other social science disciplines 
(Office of Inspector General, 1994).” 

This allows investigators to test hypotheses 
across a large number of variables rather than 
being restricted to a handful of items. For exam- 
ple, Davis (2000) looked at trends on 81 items, 
Freese, Powell, and Steelman (1999) exam- 
ined birth order differences with 106 variables, 
Smith (2005a) considered ethnic and religious 
differences across 150 variables, and Greeley 
(1995) utilized 230 variables to study religion. 

A third key design feature is the GSS’s rich 
and detailed set of demographics. As discussed 
above, the GSS has backround variables on 
respondents, spouses, household, and parents, 
and many multiple measures on such variables 
as race/ethnicity, occupation, income, and com- 
munity type. 

Finally, the GSS serves as a standard for many 
other surveys. It is widely used as a national 
norm for comparison with student, local, state, 
international, and special samples. 


6.2 Societal change 


The GSS is the single best source of trends in 
social attitudes available. The 1972-2004 GSSs 
have time trends of over 1400 variables with 
hundreds spanning 30+ years. As Nie, Junn, 
and Stehlik (1996) have noted, the GSS “is the 
only continuous monitoring of a comprehen- 
sive set of non-economic attitudes, orientations, 
and behaviors in the United States today.” Or 
as Morin (1998) characterized it, the GSS is 
“the nation’s single most important barometer 
of social trends.” 

Many general studies of societal change have 
been carried out. DiMaggio, Evans, and Bryson 
found little support for the simple, attitude 
polarization hypothesis. Most scales and items 
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did not become more polarized under sev- 
eral definitions, but some important, but iso- 
lated, examples did emerge (DiMaggio, Evans, 
and Byson, 1996; DiMaggio and Bryson, forth- 
coming). Likewise, Hochschild (1995) found 
convergence regarding the “American Dream” 
across race and class lines. Smith (1994; 1997) 
and Davis (1995; 2000) found that most soci- 
etal change in attitudes is (1) slow, steady, and 
cumulative, and (2) that most societal change 
is explained (in decreasing order of impor- 
tance) by (a) cohort-education turnover mod- 
els, (b) episodic shocks (e.g., wars and political 
scandals), and (c) structural changes in back- 
ground variables. 

Many studies of change within particular top- 
ics have also been conducted. One of the top 
areas is social capital. Putnam and others (2000; 
Crawford and Levitt, 1998) have argued that 
social capital is eroding and this is seriously 
undermining the smooth operation of the polit- 
ical system and society in general. Ladd (1996; 
1999) counters that the change is both exagger- 
ated and is not so much a decline, but a recon- 
figuration of civil society. Similarly, Paxton 
(1999) finds a mixed pattern of change with a 
decline in individual trust, no general decline 
in trust in institutions, and no decline in vol- 
untary associations. 

Intergroup relations is another major area of 
analyzing trends. Research indicates that inter- 
group relations are multidimensional and mul- 
tiple indicators are needed to track attitudes 
towards many different aspects (e.g., target 
groups, principles, policies, role of govern- 
ment, etc.). Schuman and colleagues (Schuman 
and Krysan, 1999; Schuman, Steeh, Bobo, and 
Krysan, 1997) have demonstrated that trends 
have proceeded at very different rates, with 
quick and large-scale shifts towards the prin- 
ciple of racial equality at one extreme to little 
or no gain in support for concrete measures to 
ensure equal treatment at the other end. 

Societal changes in family values have also 
been frequently examined and show a mas- 


sive shift from traditional to modern attitudes 
and practices. Smith (1999) showed that many 
family values have become less traditional and 
that the changes in family values were both 
assisted by changes in family structure and in 
turn facilitated the shift in the composition 
of households. Popenoe and Whitehead (1999) 
focused on the declining centrality of marriage 
over the last generation. Alwin (1996) showed 
how the coresidence preferences of families 
changed both across time and across cohorts. 
Straus and Mathur (1996) found that support 
for both spanking and obedience in children 
declined. Brewster and Padavic (1998), Misra 
and Panigrahi (1995), and Rindfuss, Brewster, 
and Kavee (1996) isolated gender interaction 
and cohort effects as the top causes of shifts in 
gender role attitudes. 

Of course the GSS also covers trends in scores 
of other areas. For example, Davis and Robinson 
(1998) showed a notable shift in the class iden- 
tities of married couples with both husbands 
and wives increasingly using the wives’ char- 
acteristics in assessing their own class iden- 
tity. Hunt (1999) indicated that the Hispanics 
have become less Catholic both across time 
and across immigrant generations. Since Occu- 
pational Change in a Generation II in 1973, 
the GSS has been the main source of data on 
changes in intergenerational mobility. As Mare 
(1992) noted, “Except for the NORC General 
Social Survey (GSS), we have no standard vehi- 
cle for monitoring the process of social strat- 
ification...” Recent examinations of the trends 
in mobility include Davis (1994), Hauser (1998) 
and Hout (1997). 


6.3. Cross-national 


With 19 completed and released modules and 
2335 usages, ISSP has produced a body of 
research that has been almost as wide-ranging 
and difficult to summarize as the GSS in 
general. (For the latest ISSP bibliography see 
www.issp.org) 


Presented by: hePselAialal AY. SQM onal research: the general social surveys 45 


As a single example of the cross-national 
uses, consider the 1995-96 and 2003-2004 
national identity modules. They have been 
used to examine the shifting role of the nation 
state as its position has been changed both 
from above by regional and international orga- 
nizations (e.g., EU, NAFTA, UN, WTO) and 
from below by movements for autonomy and 
local self-government, and to determine the cul- 
tural identity and distinctiveness of individual 
countries (e.g., Hjerm, 1998; 2004; Jones, 2001; 
McCrone and Surridge, 1998; Peters, 2002). For 
example, Smith and Jarkko (1998) and Smith 
and Kim (2006) showed that national pride 
in ten domains was determined by a com- 
bination of objective conditions and a peo- 
ple’s understanding of their history. They also 
showed that national pride was uniformly 
lower among ethnic, racial, religious, linguistic, 
and regional minorities and that national pride 
has declined across birth cohorts in almost all 
countries. 


6.4 Methodological research 


The GSS gives the highest priority to maintain- 
ing data quality and minimizing measurement 
error. In part this has been carried out by the 
adoption of rigorous design and execution stan- 
dards (e.g., full-probability sampling, pretesting 
and careful item development, maintaining a 
high response rate, data validation, data clean- 
ing, etc.). In addition, this has been achieved 
by carrying out one of the most extensive pro- 
grams of methodological research in survey 
research. The project has 105 GSS Method- 
ological Reports that use both experimental 
and non-experimental designs to study virtu- 
ally all aspects of total survey error (Davis, 
Smith, and Marsden, 2005). Among the topics 
covered are: (1) the reliability and validity of 
behavioral reports; (2) test/retest reliability; (3) 
sample-frame comparability; (4) sensitive top- 
ics; (5) third-person effects; (6) education/age- 
cohort interactions; (7) nonresponse bias, (8) 


the measurement of race and ethnicity; (9) con- 
text effects; (10) question wording; (11) scale 
construction; (12) item nonresponse, and (12) 
cross-national comparisons. 


7 Summary 


The GSS has aptly been described as a “national 
resource” (Firebaugh, 1997; Working Group 
on Large-Scale Data Needs in Luce, Smelser, 
and Gerstein, 1989), as a “core database” in 
both sociology and political science (Campbell, 
2001; Kasse, 2001), as a “public utility for the 
community at large” (Office of Inspector Gen- 
eral, 1994), as having “revolutionized the study 
of social change” (ICPSR, 1997), and as “a major 
source of data on social and political issues and 
their changes over time” (AAPOR Innovators 
Award, 2000). 

In order to serve the  social-science 
community, the GSS draws heavily upon that 
community of scholars in the selection and 
development of modules and items. Between 
the Board and developmental committees 
hundreds of researchers have participated 
in the design of GSS components. Then the 
GSS provides quick, equal, and easy access to 
the data which in turn leads to widespread 
utilization of the data by thousands of social 
scientists and hundreds of thousands of their 
students. It is not only widely used in the 
United States, but especially through ISSP it is 
used by scholars around the world. The known 
GSS research usages number over 12,000. 
Usage has been especially strong in the top 
sociology journals where only data collected 
by the Bureau of the Census are used more 
frequently than the GSS. 

In sum, the GSS produces top-quality, repre- 
sentative data for the United States and, through 
ISSP, in many other countries on topics of fun- 
damental importance to the social sciences, is 
extremely widely used in both teaching and 
research, and has considerably expanded the 
knowledge base in the social sciences in a very 
cost-effective manner. 
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| Chapter 4 J 


Structuring the National Crime Victim 
Survey for use in longitudinal analysis 


Lawrence Hotchkiss and Ronet Bachman 


1 Introduction! 


There are many research questions that can only 
be answered with data that follow individuals 
over time. Research in the area of crime vic- 
timization is no exception. In fact, a burgeon- 
ing area of research in criminology is related 
to issues of recurring victimization both at the 
individual level (Farrell, Tseloni and Pease, 
2005; Lauritsen and Quinet, 1995; Menard, 
2000; Stevens, Ruggiero, Kilpatrick, Resnick 
and Saunders 2005) and the aggregate house- 
hold or neighborhood level as well (Bowers and 
Johnson, 2005; Farrell, Sousa and Weisel, 2002; 
Outlaw, Ruback and Britt, 2002). Although the 
National Crime Victimization Survey (NCVS) is 
most often utilized in a cross-sectional format 
or to estimate aggregate trends, it has the capac- 
ity for individual-level longitudinal analysis, 
because each sample unit (address) stays in the 
sample for three-and-one-half years. Despite the 
potential to reconfigure NCVS data files longi- 
tudinally, very few have undertaken the task, 
primarily because of the many challenges one 


1 The authors wish to thank Jeremy Shimer of the US 
Census Bureau and Thomas Zelenock of the ICPSR 
for valuable reviews and suggestions related to this 
chapter. 


encounters when doing so. Issues of attrition 
plague most surveys that attempt to track indi- 
viduals over time, however the issue is even 
more complex for the NCVS. For example, the 
NCVS samples residential addresses, not house- 
holds or individuals, so different household 
members may move in and out of an address, or 
a different household may move in altogether. 
Another stumbling block is related to multi- 
ple victimizations, particularly those that occur 
within the same month. The only information 
the NCVS collects about the time ordering in 
a given interview is the month of occurrence. 
Moreover, if a respondent has experienced six 
or more similar victimizations and can’t recall 
them separately, these incidents are recorded 
as one series crime. The only information the 
NCVS collects about the time ordering of series 
crimes is the number of incidents per quarter. 
These, along with other challenges, confront 
researchers eager to exploit the longitudinal 
structure of the NCVS. 


2 NCVS procedures 


The NCVS is an ongoing survey of personal 
and household victimizations designed to be 
representative of all persons living in noninsti- 
tutional households in the United States over 
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12 years of age. Begun in 1973, the NCVS 
was designed with four primary objectives: 
(1) to develop detailed information about vic- 
tims and consequences of crime, (2) to estimate 
the number and types of crimes not reported 
to the police, (3) to provide uniform measures 
of selected types of crimes, and (4) to per- 
mit comparisons over time and types of areas. 
The survey categorizes crimes as “personal” or 
“property.” Personal crimes include rape and 
sexual attack, robbery, aggravated and simple 
assault, and purse-snatching/pocket-picking, 
while property crimes include burglary, theft, 
motor vehicle theft, and vandalism. Respon- 
dents are asked a series of questions designed 
to determine whether she or he was victim- 
ized during the six-month period preceding 
the first day of the month of the interview. 
A “household respondent” is also asked to 
describe crimes against the household as a 
whole (e.g., burglary, motor vehicle theft). An 
incident report for each victimization includes 
information such as type of crime, month, time 
of day, and location of the crime, informa- 
tion about the offender including the num- 
ber of offenders, perceived gang membership, 
their race, gender, and age, and the relationship 
between victim and offender, self-protective 
actions taken by the victim during the incident 
and results of those actions, consequences of 
the victimization including injuries sustained, 
type of property lost, whether the crime was 
reported to police and reasons for reporting or 
not reporting, and offender use of weapons, 
drugs, and alcohol. The NCVS also collects the 
respondent’s demographic information such as 
age, race, gender, income, and occupation. 
The NCVS uses a stratified, multistage clus- 
ter sampling design. The primary sampling 
units (PSUs) are counties, groups of smaller 
counties, and metropolitan areas. A sample 
of census-identified enumeration districts is 
selected from PSUs; these districts are geo- 
graphic areas that encompass approximately 
750-1500 persons and range in size from a 


block to hundreds of square miles. Enumeration 
districts are divided into clusters of approxi- 
mately four housing units. The final sampling 
procedure randomly selects clusters of housing 
units. The resulting sample consists of approx- 
imately 43,000 housing units and other living 
quarters. These housing units remain in the 
sample for a period of three-and-a-half years. 
Every six months, a new group of housing units 
replaces one-seventh of the housing units then 
in the sample (For a more detailed discussion, 
see the NCVS codebook, available from ICPSR 
or the Bureau of Justice Statistics, US Depart- 
ment of Justice, 2006). 


3 Individual-level longitudinal 
analyses using the NCVS 


There are numerous research questions that 
could be investigated using these data in a 
longitudinal form. We will highlight a few 
studies that have successfully done so, each 
investigating different research questions. One 
of the first successful attempts to merge the 
individual files in the NCVS into a longitudinal 
format was published by Mark Conaway and 
Sharon Lohr (1994). They were interested in 
examining the factors related to the police 
reporting behavior of crime victims. Unlike 
most of the extant research on this topic that 
primarily utilizes information about the inci- 
dent, such as crime seriousness, Conaway and 
Lohr were interested in the effects of previous 
reporting behavior on future reporting behav- 
ior. Specifically, they investigated whether 
victims who reported positive experiences 
with the police in a previous interview would 
be more likely to report victimization again 
than respondents who either had negative 
experiences with reporting to the police in 
the past, or had never reported a victimization 
to police at all. After controlling for other 
characteristics of the crime and demographics 
of the respondent, they found that victims were 
more likely to report a violent crime to police if 
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a previous victimization had been reported and 
if police did routine follow-up activity or if an 
arrest was made or property was recovered. 

Most recently, researchers using NCVS data 
from 1998 to 2000 found that investigatory 
efforts by police in a prior victimization 
increased the likelihood of victims reporting 
an ensuing victimization only when the vic- 
tim, rather than someone else, reported the 
prior victimization (Xie, Pogarsky, Lynch, and 
McDowall, 2006). Xie and colleagues also found 
that an arrest after an individual was victim- 
ized in the past had no effect on whether the 
individual reported an ensuing victimization 
(2006). 

Repeat victimization has become a major 
focus of criminological researchers for several 
reasons. Perhaps foremost of these is the fact 
that offending behavior, victimizations, and 
offending locations each tend to cluster (Farrell, 
Tseloni, and Pease, 2005). Estimating the fac- 
tors related to repeat victimization, then, can 
also help illuminate patterns of offending and 
“hot spots” for crime. Information regarding 
repeat victimization, then, can inform crime 
control policies as well as the development of 
criminological theory (Laycock, 2001). Repeat 
victimization has been found to account for 
a significant proportion of victimization in 
juveniles (Lauritsen and Davis Quinet, 1995; 
Menard, 2000), and for adults (Farrell, Tseloni, 
and Pease, 2005; Gabor and Mata, 2004). After 
examining the rates of repeat victimization 
in the NCVS, Graham Farrell and his col- 
leagues (2005) concluded that repeat victim- 
ization accounted for about one-quarter of all 
assaults and sexual assaults, about 20% of rob- 
beries and other personal thefts, and about 18% 
of burglaries. 

Lynn Ybarra and Sharon Lohr (2002) exam- 
ined estimates of repeat victimization using the 
NCVS and the effects of repeat victimization on 
attrition, particularly the effects of being victim- 
ized by an intimate partner. Importantly, what 
they found was that victims of intimate partner 


violence were more likely to drop out of the sur- 
vey than nonvictims. This type of research has 
important implications for magnitude estimates 
of victimization in general, and for intimate 
partner violence in particular. If individuals 
who drop out of the survey (e.g., move or refuse 
to respond) are more likely to be repeat victims 
than are individuals who replace them, then 
cross-sectional estimates of victimization may 
be biased downwards. 

Catherine Gallagher (2005) examined the 
extent to which injured crime victims who 
sought medical care for their injuries were more 
or less likely to sustain injuries as the result 
of a future victimization. Specifically, she com- 
pared the probability of injury recurrence for 
injury victims of crime who sought medi- 
cal help for their injuries to the probability 
for those who had not sought medical help. 
Gallagher (2005), in fact, found that victims 
who received medical care for earlier injuries 
resulting from violence were at a decreased risk 
of sustaining a future violence-related injury 
even after controlling for the seriousness of 
the injury and medical coverage. Gallagher 
also found that past medical care more gen- 
erally protected individuals from future vio- 
lence. Although she was not able to isolate 
the mechanism for this finding, it does appear 
that something about medical treatment has a 
violence-reducing effect. Gallagher concludes 
that efforts should be made to secure medical 
treatment for victims of crime in general. 


3.1 Structure of the NCVS full 
hierarchical data files 


The National Crime Survey (NCS) was an ongo- 
ing longitudinal crime-victimization survey of 
households and persons. Begun in 1973, it 
underwent a major revision in the years lead- 
ing up to 1992 when it was renamed the 
National Crime Victimization Survey (NCVS). 
This chapter describes data associated with 
the full hierarchical files of the NCVS. The 
NCVS selects a new sample every three years, 
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numbered 15 through 22 to date. Earlier sample 
numbers are associated with the NCS. 

Addresses, not individuals nor households, 
are the units selected into each sample. Res- 
idents of selected addresses are scheduled to 
be interviewed seven times. Questions about 
victimization incidents refer to the six months 
prior to the interview, called the reference 
period. The first interview is used for “bound- 
ing” victimizations within the reference period 
to eliminate over-reporting errors due to tele- 
scoping. Telescoping refers to the tendency 
of respondents to report crimes that occurred 
before the reference period. Victimization inci- 
dents reported during the first interview are not 
used in the estimation of victimization rates. 
Instead, they are used to identify incidents 
reported during interview two that occurred 
before the six-month reference period. Data 
from the first interview are excluded from the 
public-use data files. There is, however, one 
exception to this. If a new household moves 
into a sample address, no bounding interview is 
conducted for this replacement household, and 
its first interview is included in the public-use 
data. Hence, household/person interview num- 
bers for nonreplacement households span the 
interval 2 through 7, inclusive, but interview 
numbers for replacement households vary over 
the interval 1 through 6, inclusive. 

Each NCVS sample is divided into six rota- 
tion groups, numbered 1 through 6, and each 
rotation group is divided into six panels, num- 
bered 1 through 6. Interview 1 of rotation-group 
1, panels 1 through 6 occurs during January 
through June, respectively of the year the sam- 
ple is activated. The first interview of rotation- 
group 2, panels 1 through 6 occurs each July 
through December of the same year. The second 
interview of rotation group 1, panels 1 through 
6 occurs during July through December of year 
1. Hence the second interview of rotation-group 
1 occurs at the same time as the first interview 
of rotation-group 2. 


This staggered design extends to all six rota- 
tion groups and panels within rotation groups. 
Since each address is interviewed every six 
months and stays in the sample for seven inter- 
views, beginning with the last interview (7) of 
the first sample, seven rotation groups are inter- 
viewed simultaneously, some from one sam- 
ple and some from the next numbered sample 
in the sequence. But interviews from just six 
rotation groups appear in the data for a given 
date, because the bounding interviews are not 
included in the public-use files, and replace- 
ment households can be interviewed a maxi- 
mum of six times. 

Table 4.1 summarizes this structure for the 
data contained in the 2003 full-sample file. This 
file contains data for all of the interviews in 
2003 and the first six months of 2004. Sample 
designations are given as header rows, prefixed 
in the documentation with the letter “J” to des- 
ignate a survey done by the US Census Bureau 
for the US Department of Justice. 

The address-interview number appears on 
the line labeled “ADier_ no”. This is the same 
value as what the Census Bureau calls time in 
sample (TIS). The main cell entries give the 
panel (tens digit) and rotation numbers (ones 
digit). The row labels indicate the year and 
month in which residents of the address are 
scheduled to be interviewed. Empty cells indi- 
cate that no panel/rotation combination in the 
specified sample/rotation group/panel is sched- 
uled to be interviewed during the year and 
month given by the row label. For example, 
the empty cell in the first column and row 
“2003 Jan” indicates all seven interviews were 
completed before January 2003 for respondents 
in sample J20, panel 1, rotation-group 1. The 
empty cell in the last column of the same row 
(“2003 Jan”) indicates that interviewing had not 
begun by January 2003 for sample J21, panel 
1, rotation-group 6. The first interview for this 
group (sample J21, panel 1, rotation-group 6) 
began in July 2003, as indicated by the “16” 
(panel 1, rotation 6) entry in the last column 
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Table 4.1 Rotation chart for selected years, samples J20 and J21® 


Sample number 


Year and month J20 JB 

ADier_no: a 6 5 4 i) 2 all 

2003 Jan 15 16 a lei 12 13 14 15 

Feb 25 26 21 22 23 24 25 

Mar 35 36 31 32 33 34 35 

Apr 45 46 41 42 43 44 45 

May 55 56 51 52 53 54 55 

Jun 65 66 61 62 63 64 65 
ADier_no: 7 6 5 4 3 2 all 
2003 Jul 16 11 12 13 14 15 16 
Aug 26 21 22 23 24 25 26 
Sep 36 31 32 33 34 35 36 
Oct 46 41 42 43 44 45 46 
Nov 56 royal 52 53 54 55 56 
Dec 66 61 62 63 64 65 66 
Year and month J20 J2i 

ADier_no: 7 6 5 4 3 2) 
2004 Jan UT, 12 13 14 15 16 
Feb 21 22 23 24 25 26 
Mar 31 32 33 34 35 36 
Apr 41 42 43 44 45 46 
May 5 52 53 54 55 56 
Jun 61 62 63 64 65 66 


®Extrapolated and adapted from “NCS/NCVS Rotation Chart:: July 1994-June 1998” (US Department of Justice, Bureau of 
Justice Statistics, 2006: pp. 443-444) 


of row “2003 Jul.” Similarly, the “34” in col- Michigan. The full data are stored in ASCII for- 
umn 10, of the row for April 2004 indicates mat using a hierarchical structure. There are 
that sample J21, panel 3, rotation-group 4 was four types of records in the full data files: 


scheduled for its fourth interview during April 
of 2004 (read the interview number from the 


header rows titled “ADier_no”). Address: One record per address per com- 
The full NCVS data collection is supplied by pleted contact. 
the Inter-university Consortium for Social and Household: One record per household per 


Political Research (ICPSR) at the University of completed contact. 
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= Person: One record per person per completed 
household interview. 

Incident: One record per person/household 
per completed interview per crime- 
victimization incident. 


All four record types are supplied in a single 
file containing a record-indicator flag. 

A contact is not the same as a completed 
interview. A variable in the household data 
indicates whether each contact was, in fact, 
a completed interview. For the data files to 
date, over eighty percent (80.5%) of the house- 
hold records indicate completed household- 
level interviews for all eight (22—15+1=8) 
NCVS samples combined. Since a contact does 
not occur for every scheduled interview, the 
number of records per address, household, 
and person varies, with a maximum of six in 
public-release data. The number of incident 
records per person varies from zero to the num- 
ber of victimization incidents reported to have 
occurred during the reference period (prior six 
months). The exception to this is for series inci- 
dents, which are recorded on just one incident 
record. An incident is classified as a series inci- 
dent if an individual experiences six or more 
incidents within a reference period and can’t 
recall enough details of each to report them as 
separate incidents. There is, however, no the- 
oretical upper limit on the number of incident 
records. 


3.2 Restructuring the NCVS data 


The hierarchical organization of the NCVS data 
minimizes the space needed to store it. A “rect- 
angular file” containing all the data would be 
excessively large. Nonetheless, most statistical 
packages are not designed to analyze directly 
data stored in a hierarchical format like that of 
the NCVS. 

Tables 4.2, 4.3 and 4.4 present a highly sim- 
plified example of the restructuring needed to 
produce a rectangular person-level file contain- 
ing gender (V3018, constant over interviews) 


and one variable for each of the seven inter- 
views indicating (1) household income, (2) age, 
and (3) marital status at the time of the inter- 
view. Additionally, two variables per interview 
allow for up to two crime-victimization reports 
per interview. 

The layout for these variables for two 
households and three persons in the NCVS hier- 
archical format appears in Table 4.2. Where 
feasible, the table contains the variable names 
used in the NCVS documentation (e.g., V2026 
for household income). But the household ID 
(HH_ID), person ID (pers_ID) and household 
interview number (HHier_no) must be con- 
structed from more than one NCVS variable and 
are given names as shown in Table 4.2. For clar- 
ity, the table displays the three types of records 
in separate blocks, but the NCVS hierarchical 
data files are sorted by address ID and record 
type within the address ID.? 

Table 4.3 shows the households, individuals, 
and victimization incidents merged together 
by household ID (HH_ID), person ID (pers_ID), 
and household interview number (HHier_no). 
This produces a file with data from each inter- 
view appearing on separate records. This format 
sometimes is called “long” format. 

Table 4.3 illustrates how this merge generates 
(1) duplication of data, and (2) a large amount 
of missing data. The duplication arises from 
two sources: repeating household information 
on each person record and repeating house- 
hold and person information for each incident 
report. In Table 4.3, all the household infor- 
mation for household 1 is listed twice, since 
two persons in the example reside in house- 
hold 1, and both household and person vari- 
ables are replicated for household 2, person 1, 


2A complete description of how to use the identifica- 
tion variables to construct a working rectangular file 
from the NCVS hierarchical files may be obtained 
directly from the first author (Lawrence Hotchkiss, 
larryh@udel.edu or at the following URL: http:// 
gorilla.us.udel.edu/ncvs//NCVSdataPreparation.doc) 
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Table 4.2 Hierarchical structure of the NCVS full data files 


Household Records 
HH_ID HHier_no V2026 
1 2 13 
1 3 13 
1 4 13 
1 5 13 
1 6 14 
1 7 14 
2 3 9 
2 4 98 
2 5 8 
2 7 9 
Person Records 
Pers_ID HH_ID HHier_no V3014 V3015 V3018 
1 1 2 47 1 1 
1 1 3 47 1 1 
1 al 4 48 1 1 
1 1 5 48 1 A. 
1 1 6 49 4 1 
1 1 7 49 4 1 
2 1 2 15 5 2 
2 i 3 16 5 2 
2 1 4 16 5 2 
2 1 5 17 5 2 
2 al, 6 17 5 2 
2 a, 7 18 5 2 
1 2 3 35 4 2 
1 2 4 35 4 2 
1 2 5 36 4 2 
dL. 2 7 37 4 2 
Incident Records 
Pers_ID HH_ID HHier_no V4529 
1 2 5 17 
1 2 5 20 
Definitions: 1 HH_ID = Household ID, constructed from 3 NCVS variables 
2 HHier_no = Household interview number, calculated from NCVS variables (see below) 
3 V2026 = Household income: 8: $20,000 < inc < 25,000; 9: $25,000 < inc < 30,000; 13: $50,000 < inc 
< 75,000; 14: inc > $75,0000 
4 _ Pers_ID = Person ID, constructed from 4 NCVS variables 
5 V3014 = Age (years to last birthday, allocated) 
6 V3015 = Marital status (current interview): 1 = Married; 2 = Widowed; 3 = Divorced; 4 = Separated; 
5 = Never married; 8 = Residue; 9 = Out of universe 
7 V3018 = Gender (1=male; 2=female, allocated; 8 = Residue; 9 = Out of universe) 
8 V4529 = Type of crime (TOC) classification: 17 = Assault without weapon without injury, 20 = Verbal 


threat of assault 
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Table 4.3 Merged household, person and incident records by HH_ID, Pers_ID and ADier_no (long format) 


HH_ID Pers_ID HHier_no V2026 V3014 V3015 V3018 V4529 
1 1 2 13 47 1 1 

1 1 3 13 47 1 L 

L. 1 4 13 48 1 1 

1 1 5 13 48 1 1 

1 1 6 14 49 4 uf 

1 1 7 14 49 4 1 

1 2 2 13 15 5 2 

ot 2 3 13 16 5 2 

1 2 4 13 16 5 2 

1 2 5 13 17 5 2 

1 2 6 14 17 5 2 

1 2 7 14 18 5 2 

2 1 3 9 35 4 2 

2 1 4 98 35 4 2 ‘ 
2 1 5 8 36 4 2 17 
2 1 5 8 36 4 2 20 
2 1 7 9 37 4 2 


Note: See Table 4.2 for definition of the variables. 


interview 5. The missing data are due to the fact 
that most households and persons report no vic- 
timizations during each interview. In Table 4.3, 
for example, 15 out of the 17 values for type of 
crime (V4529) are missing. 

Table 4.4 summarizes a layout variously 
called a rectangular file, flat file or wide-format 
file. The data in this example produce 39 vari- 
ables in the rectangular file, in contrast to just 
8 in the long-format file in Table 4.3. The con- 
ventional representation of a data file associates 
each variable with a column and each person 
(observation) with a row, but a table containing 
39 columns and three rows cannot be displayed 
on the printed page. Table 4.4 shows the trans- 
position of the usual representation of a data 
file with the transposed observations wrapped 
into three columns to conserve space. 

The rectangular file represents data for the 
different interviews and incidents by different 
variables (columns rather than rows). Gender 
(V3018) appears just once per person, since it 
does not change. A variable must be defined for 
each of the other non-ID household and person 


variables for each of the seven possible inter- 
view numbers, because they are not necessarily 
constant over interviews. In the example, each 
of these new variables is given a numeric suf- 
fix corresponding to the interview number, e.g., 
V3015_1, V3015_2,..., V3015_7. 

Variables appearing in the household records 
are attached to each person in the household, 
generating substantial duplication. Variables 
appearing on the incident records generate even 
more variables in the rectangular file. A vari- 
able must be defined for each interview and 
each incident per interview. In the example, the 
number of incidents is capped at two, so 14 
variables are needed to represent type of crime, 
which requires just one variable in the inci- 
dent records. In Table 4.4, variable names for 
the incident variables are formed by adding a 
double suffix to the original variable name, a 
letter to index incidents and a number to index 
interviews (V4529 Al... V4529 B7). 

Even though no person or household appears 
in the data for more than six interviews, seven 
variables must be reserved for each variable in 
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Table 4.4 “Flat file’ merged household, person 
and incident records 


No. Variable Values 
1 HH_ID 1 ‘ll 2 
2 Pers ID 1 2 1 
3 V2026 1 ; ‘ 
4 V2026 2 13 13. i 
5 V2026 3 13 13 9 
6 V2026 4 13 13 98 
7 V2026_5 13 13 8 
8 V2026 6 14 14 ‘ 
9 V2026_7 14 14 9 
10 V3014 1 < . 
1 V3014 2 47 15 ; 
12 V3014 3 47 16 35 
13 V3014 4 48 16 35 
14 V3014_5 48 17 36 
15 V3014 6 49 lg ‘ 
16 V3014_7 49 18 37 
17 V3015 1 
18 V3015 2 1 5 : 
19 V3015 3 1 5 4 
20 V3015 4 1 5 4 
21 V3015_5 1 5 4 
22 V3015 6 4 5 i 
23 V3015_7 4 5 4 
25 V3018 al 2 2 
26 V4529 Al 
27 V4529 A2 
28 V4529 A3 
29 V4529 A4 ‘ é - 
30 V4529 A5 ¢ ‘ 17 
a V4529 A6 
32 V4529 A7 
33 V4529 Bl 
34 V4529 B2 
35 V4529 B3 
36 V4529 B4 ‘ ‘ ‘ 
37 V4529 B5 5 ‘ 20 
38 V4529 B6 
39 V4529 B7 


the NCVS household and person data that is not 
constant over all interviews. And seven times 
the number of incidents is required for variables 
residing on incident records. Seven interview 
numbers are needed, even though no individual 
or household appears in the data more than six 


times, because interview numbers for replace- 
ment households span 1 to 6, inclusive, but 
interview numbers for nonreplacement house- 
holds span 2 to 7, inclusive. 

As you can see, the wide-format file gener- 
ates enormous redundancy. The average size 
of households is about 2.2, so household vari- 
ables are repeated over two times on average 
in a file where observations are individuals. 
Table 4.5 gives additional indication of the mag- 
nitude of redundancy. It shows a frequency 
distribution of the number of interviews and 
a cross-tabulation of number of incidents by 
household interview number. The top panel 
of the table shows, for example, that 255,854 
out of a total of 737,946 (34.7%) persons com- 
pleted just one interview. For these persons, 
six of their seven household and person vari- 
ables in each sequence would be missing in a 
rectangular file. The bottom panel of Table 4.5 
indicates an even more extreme proportion of 
missing data. When provision is made for the 
maximum number of incidents (13), each vari- 
able on the NCVS incident records generates 
7x13 = 91 variables in the rectangular file, 
V4529 Al... V4529 M7, for instance. Yet, as 
the tabulation demonstrates, the vast majority 
of these values are missing.* 

Often, however, one needs to keep just one 
variable per interview in the wide-format file 
for each variable in the incident records. Many 
analyses require only summaries of incidents, 
such as the number of violent-crime victimiza- 
tions reported per interview, or simply a flag 
set to 1 if one or more violent victimizations is 
reported, and zero otherwise. In these situations, 
each variable used from the incident file requires 
just one variable per interview rather than 13. 


3The 782,316 persons in our working data file gen- 
erate 91 x 737,946 = 67,153,086 data cells for each 
variable in the incident records kept in the flat 
file. Of these, 66,989,098 contain missing values 
(99.8%, assuming incident records contain no miss- 
ing values). 
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Table 4.5 Distribution of number of interviews and 
number of incidents by interview number 


Number of Frequency Cumulative 
interviews frequency 

1 255,854 255,854 

2 150,825 406,679 

3 101,584 508,263 

4 76,836 585,099 

5 64,282 649,381 

6 88,565 737,946 


Cross-tabulation: Household interview number (HHier_no) by number of incidents 


Number of Interview number 
incidents 
il 2 3 4 5 6 7 Total 
0 151,712 375,746 323,818 289,046 265,803 246,185 235,684 1,887,994 
1 17,222 26,175 19,204 15,346 12,246 11,715 10,928 112,836 
2 3132 4140 2662 2044 1446 1413 1348 16,185 
3 794 984 607 412 272 314 287 3670 
4 275 322 182 99 90 90 86 1144 
5 88 102 59 41 25 31 30 376 
6 22 36 24 14 7 14 8 125 
7 11 9 4 6 3 2 1 36 
8 4 3 2 1 0 1 0 11 
9 2 2 2 2 1 1 2 12 
10 2 2 1 0 ll 1 0 7 
11 0 2 0 0 0 0 0 2 
13 0 1 0 0 0 1 0 2 
Total 173,264 407,524 346,565 307,011 279,894 259,768 248,374 2,022,400 


Notes: (1) Cell entries in the top panel are counts of persons, and cell entries in the bottom panel are person records, both 
accumulated over all NCVS samples, 15 through 22, excluding cases with out-of-range address numbers and inconsistent 


reports of gender; (2) Excludes type Z noninterviews. 


Analyses reported later in this chapter com- 
pare a model reported by Janet Lauritsen (2001) 
for all interviews combined to models with 
the same specification broken out by interview 
number. These analyses were done twice, once 
using a wide-format file and once using the 
long-format file. In the first instance, the anal- 
ysis for each interview number used variables 
such as age2, age3,..., age7, and the second 
approach used “by-variable” processing. All the 


numeric output from the two approaches match 
exactly. The long-format file contains a sub- 
set consisting of 140 variables. A wide-format 
file capping the number of incidents at 5 and 
including just 72 of the variables in the long- 
format file generated 885 variables. The size of 
the long-format file is just over one gigabyte, 
and the size of the wide-format file is over two 
gigabytes (setting the default variable length = 3 
in both cases). 
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3.3. Attrition and nonresponse 


Even with the high response rate (95%) 
reported in the NCVS documentation, only 
about 23% of addresses appear in the house- 
hold records for all six possible interviews in 
the public-use files, and over 15% were, in fact, 
classified as interviewed all six times. This is 
due in part to the fact that the maximum num- 
ber of possible interviews is less than seven for 
some households, but most of it is due to attri- 
tion, and much attrition occurs at the household 
and person levels. 

Nonetheless, the attrition rates are good by lon- 
gitudinal survey standards. As Table 4.6 reports, 
about 70% of persons interviewed during inter- 
views 2 through 6 also were interviewed dur- 
ing their next scheduled interview. But the table 
also shows that the percentage of persons who 
are interviewed after one or more intervening 
interviews rapidly dwindles. For example, of 
those interviewed at interview 2, 53.14% were 
interviewed at interview 4, and this declines to 
27.52% by interview 7. These percentages under- 
state attrition, since not all of those completing 
interview 2 and interview 7, for example, com- 
pleted all the intervening interviews. 

Additionally, not all person records in the 
NCVS data files are associated with completed 
interviews. If a household is classified as a 
completed interview but an individual member 
of the household could not be interviewed, the 


Table 4.6 Percentage of persons interviewed 
during an interview who also were interviewed 
during the subsequent interview 


Subsequent First interview number 


interview number 


3 4 5 6 


68.59 

53.14 69.02 

41.95 54.53 70.65 

33.84 44.14 57.48 72.47 
27.52 36.27 47.66 60.24 74.42 


N © OF B® 


Note: Unweighted estimates. 


noninterview is classified as a Type Z nonin- 
terview, and the person record for this person 
appears in the public-use data with personal 
information carried forward from a previous 
interview. The person weight (V3080) for type Z 
records is set to zero, and a zero weight appears 
to be the only definite indication of a noninter- 
view. The interviewed flags in Table 4.6 were 
set to zero when person weights were zero.* 


4 Example: Prediction of violent 
crime victimization 


One of the primary motivations for reformatting 
the NCVS data into a long-format or wide-format 
file is to test models of the determinants of crime 
victimization. An excellent example of this type 
of work is contained ina paper by Janet Lauritsen 
(2001). She reports several logistic-regression 
models predicting violent-crime victimization 
using as regressors: age, gender, marital status 
(married), household income, length of resi- 
dence at the current address, residence in the 
central city of an MSA, and three contextual- 
level variables defined by features of the census 
track containing sample address—a disadvan- 
tagement index, an immigration index and an 
index of area instability. One set of six mod- 
els uses only the individual-level predictors. 
One of these six models applies to all violent 
victimizations in all geographic areas. A sec- 
ond model is restricted to incidents reported to 
have occurred in the neighborhood of one’s res- 
idence. The other four are for: non-central-city 
and all neighborhoods; non-central-city within 
neighborhood; central-city and all neighbor- 
hoods; and central-city within neighborhood. 
A second set of analyses adds the contextual- 
level variables to the set of individual predictors. 


*See the NCVS Codebook (US Department of Jus- 
tice, 2006, p. 419). The authors are grateful to Jeremy 
Shimer of the US Census Bureau for clarifying this 
issue. 
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The paper uses a special release of the NCVS 
data for 1995 containing confidential identifiers 
needed to match the NCVS sample addresses 
to census tracts as required to merge the con- 
textual variables onto NCVS data. NCVS data 
from households, persons, and incidents were 
merged into a single long-format analysis file 
containing one record per person per inci- 
dent. Inferential statistics were corrected for the 
dependence among the observations. 

This analysis strategy that implies the model 
is correct for all interviews, including the 
first unbounded interview for replacement 
households. It also precludes a precise estimate 
of the risk of victimization during a six-month 
interval, because the dependent variable does 
not exactly indicate whether an individual 
was a victim of at least one violent-crime 
incident during the reference period. Instead, 
the dependent variable for each person is 
assigned 0 if no incident records match to the 
person and 1 for each matching incident record 
containing a type-of-crime code defined as a 
violent crime. Since few persons report more 
than one incident per interview (Table 4.5), 
however, this latter consideration probably has 
little practical impact. 

This chapter extends the work reported by 
Lauritsen (2001) by (1) estimating separate 
logistic-regression models for each interview, 2 
through 7, and (2) defining the dependent vari- 
able to be an indicator set equal to 1 for persons 
who reported at least one victimization classi- 
fied as a violent crime during each interview 
and 0 otherwise.°® Predicted probabilities from 
a binary regression with this measure of victim- 
ization therefore indicate the risk of being the 
victim of a violent crime during a six-month 
interval. 

We report comparisons to the Lauritsen 
model for overall violent-crime victimization 


>We used Lauritsen’s definition of the dependent 
variable for our replication of her model, column 1, 
Table 4.7) 


using individual-level regressors. The vari- 
ables are: 


Violent-crime victim: Victim 
of at least 1 violent crime 
during the reference period 
for each interview [at least 
one incident record 
associated with each person 
had type-of-crime code 
(v4529) classified in the 
interval 1 through 14, 
inclusive]. 

Age Age in years (v3014) 

Male Dummy variable for male: 
male =1 (v3018=1) 
Dummy variable for race 
reported as Black (v3023 = 2)® 
Dummy variable, married = 1 
(v3015 = 1) 

Household income (v2026 
in original 14 ordered 
categories) 

Ordinal indicator of 
frequency of spending 
evenings at home (v3029) 
Number of years lived 

at current residence 
(constructed from v3031 
and v3032) 

Lived in the central city of 
an MSA (v2129 = 1) 
Missing-data dummy for 
HHinc 

EveningsMDD Missing-data dummy for 
eveningsIn 

Missing-data dummy for 
tenure 


VCvictim 
(dependent 
variable) 


Black 
Married 


HHinc 
eveningsIn 


Tenure 


cntrCityMSA 


IncomeMDD 


TenureMDD 


° Year-file 2002 and later used the expanded census 
race categories permitting respondents to select mix- 
tures of races. The Black dummy variable was coded 
to 1 if any mixture including Black was checked. The 
analogous definition was applied to White (used in 
the subsetting for all years, column 8 of Table 4.7). 
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Although Lauritsen (2001) did not mention 
missing-data dummy variables, we had to elim- 
inate missing values for income, evenings spent 
in house, and tenure to match (approximately) 
the sample size she reports. We set missing 
values for these three variables to their respec- 
tive means and set the companion missing- 
data-dummies to 1. The coefficients associated 
with the missing-data-dummies therefore esti- 
mate the degree to which the predicted proba- 
bility of being a victim of violent crime for those 
with missing values differs from the predicted 
probability using the mean on the correspond- 
ing “parent” variable. 

Table 4.7 reports (1) our replication of the 
Lauritsen model for overall violent-crime vic- 
timization (column 1), (2) a set of analyses 
consisting of one model for each of the (poten- 
tially) bounded interviews, 2 through 7 with 
the base year restricted to 1994, 1995 and 1996 
(columns 2 through 7), and (3) one model for 
interview 2 with no restriction on the base 
year and year included as one of the regres- 
sors (last column). The restrictions on the base 
year were imposed to confine the analysis to 
the time period around 1995. All the calcu- 
lations exclude cases of inconsistent gender 
reports and persons not classified as either 
Black or White. The latter restriction follows 
the subsetting used by Lauritsen. The replica- 
tion of Lauritsen’s model also excludes per- 
son records with invalid interview numbers, 
and the rest of the models exclude individu- 
als residing in households for which one or 
more address-interview numbers are out of 
range (because valid household/person inter- 
view numbers cannot be assured unless all 
address-interview numbers are valid for a 
household or person). 

The replication sample (column 1) contains 
173,487 person/incident records, after elimi- 
nating all persons with invalid address IDs 
and persons not classified as Black or White. 
Lauritsen reports 171,949 observations for her 
individual-level sample. So our replication 


sample is slightly larger than the Lauritsen sam- 
ple, and we have not been able to determine the 
reason(s). Column 1 shows estimates close to 
those reported in Lauritsen’s published work. 
The parameter estimates and pattern of statisti- 
cal significance also match closely, except that 
we get a coefficient for Black of 0.202 instead 
of 0.021 reported by Lauritsen. 

The six models designed to check whether 
effect estimates depend on the interview num- 
ber (columns 2 through 7) conforms roughly to 
the Lauritsen results; there certainly are no dra- 
matic departures from her estimates. But the 
sample sizes are large enough that it is unlikely 
the variation over interviews is entirely due 
to sampling error. And there is enough vari- 
ation across the interviews to suggest addi- 
tional work is needed to determine the cause 
of it. If attrition were random, model estima- 
tion would not depend on the interview num- 
ber. Yet, the estimated coefficient for male at 
interview 2 (OR=1.51) is about 3.4 times the 
estimate at interview 7 (OR = 1.13). In fact, 
the absolute magnitude of every effect estimate, 
except tenure, central-city residence and the 
missing-data-dummy variables for interview 7, 
is smaller than the corresponding magnitude 
for interview 2. The estimated effect of Black 
varies quite substantially over the interviews. 
Even with the large sample, it more often is 
not significant than significant. Lauritsen also 
found variation in the effect of Black among 
the several models she reported. The effect of 
age remains fairly consistent until interview 5 
when it begins to decline, ending at interview 
7 at just under half its earlier magnitudes. The 
estimated effect of tenure (length of residence 
at the current address) also varies sporadically 
over interviews. 

The last column of Table 4.7 reports estimates 
of the model with no restriction on the base 
year and the variable year added as one predic- 
tor. These estimates also conform fairly well to 
the Lauritsen model but, again, indicate several 
modest deviations from it. The most interesting 


Table 4.7 Logistic regressions predicting violent-crime victimization 


1994 < Base Year < 1996 All years 
Parameter Lauritsen! Interview 2 Interview3 Interview 4 Interview5 Interview 6 Interview 7 Interview 
estimate estimate estimate estimate estimate estimate estimate 2 estimate 
Intercept —2.541* —2.820*** —2.816*** —3.226"* —3.045*** —3.386*** —4.043*** 71.448*** 
Year —0.037*** 
Age —0.031*** —0.033*** —0.032*** —0.031*** —0.025*** —0.025*** —0.015*** —0.027*** 
Male 0.334*** 0.410*** 0.326*** 0.404*** 0.390** 0.288* 0.120 0.360*** 
Black 0.202* 0.1887 0.107 —0.020 0.136 —0.049 0.138 0.1157 
Married —0.694*** —0.682*** —0.590*** —0.759*** —0.680*** —0.555*"* —0.478*** —0.708*** 
Income —0.047*** —0.032*** —0.041** —0.029* —0.059** —0.019 0.006 —0.033*** 
Evenings in —0.098** —0.066* —0.080 —0.048 —0.093 —0.122T —0.023 —0.086*** 
Tenure —0.037*** —0.012* —0.019* —0.013* —0.033** —0.015 —0.020* —0.015*** 
Central city 0.322"** 0.314*** 0.288** 0.506*** 0.237+ 0.345* 0.388** 0.270*** 
MDD income —0.299* —0.103 —0.161 —0.320* —0.017 —0.266 —0.324 —0.194** 
MDD —0.037 —0.274 —0.613 —1.042 —0.044 —1.126 —1.254 —0.329*** 
evenings 
home 
MDD central —1.143+ —0.382 —0.174 —0.087 —0.028 0.699 —1.035 —0.596* 
city 
Sample size 173,487 100,213 83,237 71,820 64,815 58,948 54,686 387,368 


1 All interviews, 1995 


** pn < 0.0001 ** p < 0.001 *p < 0.01 + p < 0.05 
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finding here is the strong negative effect of year. 
The odds ratio for the entire 13 years covered 
by these samples is 


OR = exp(—13*0.037) = exp(—0.481) = 0.618. 


That is, the odds of being a victim of a violent 
crime have declined by a factor of 0.618 over the 
13-year period. This finding agrees with esti- 
mates published by the Bureau of Justice Statis- 
tics that show a steady decline in the incidence 
of violent crime since 1993 (BJS, 2006). It is 
also noteworthy that the negative effect of year 
occurs even with a control for age, suggesting 
that the decline in the rate of reported violent- 
crime incidents is not due entirely to aging of 
the population. 

It is important to note that attrition in the 
sample is not random. Table 4.8 shows results 
of a logistic regression model predicting nonat- 
trition from interview 2 to interview 3 (1994 < 
base year < 1996). 

The table indicates that attrition is depen- 
dent on all the variables in the Lauritsen model, 


Table 4.8 Logistic regression — response variable: 
Interview 3 completed (1=yes), independent 
variables from interview 2* 


Variable B Std err p-value exp(B) 
Intercept —0.541 0.032 <.0001 0.623 
Age 0.012 0.000 <.0001 1.008 
Male —0.176 0.015 <.0001 0.936 
Black 0.062 0.024 0.0099 1.176 
Married 0.194 0.016 <.0001 1.256 
HHincome 0.054 0.002 <.0001 1.067 
eveningsIn 0.043 0.007. <.0001 1.053 
Tenure 0.042 0.001 <.0001 1.059 


<.0001 0.873 
<.0001 0.876 
<.0001 0.906 
0.0027 0.713 


cntrCityMSA —0.150 0.016 
Income MDD -—0.229 0.021 
Evenings MDD —0.323 0.060 
Tenure MDD -—0.276 0.092 


n= 107054 


*1994 < base year < 1996. 


with very small p-values. The small p-values, 
however, reflect the very large sample size. 
The magnitude of the effects generally is small 
to moderate. Nonetheless, these results suggest 
additional work is needed to identify reasons 
for attrition and its effects on analytical model- 
ing of crime victimization. 


5 Summary and discussion 


This chapter describes how to use_ the 
National Crime Victimization Survey (NCVS) 
for individual-level analytical statistical work 
in a longitudinal format. The final section of 
the chapter, in particular, illustrates the type 
of analyses that can be conducted with these 
working datasets. It replicates part of the work 
reported in a paper by Janet Lauritsen (2001). 
Our results closely match Lauritsen’s published 
work. In addition, we find (1) estimated effects 
of regressors such as gender, race, age and mar- 
ital status differ noticeably by interview num- 
ber, (2) sample attrition is significantly related 
to all the regressors in the Lauritsen model, and 
(3) the odds of being a victim of a violent crime 
substantially decline during the period 1973 to 
2005, including controls for demographic vari- 
ables such as age, race and residence in the 
center-city of an MSA. 

Our experience with the Lauritsen model sug- 
gests that one promising line of research is 
to examine simultaneously crime victimization 
and sample attrition as a linked process. Varia- 
tions on hazard models designating crime vic- 
timization as one (transient) state and sample 
attrition as another (nontransient) state may be 
estimated by maximum likelihood (Tuma and 
Hannan, 1984). As Tuma and Hannan cogently 
argue, an important advantage of explicitly 
modeling the hazard (or survival) function is 
that one can derive implications for related 
outcome variables that are, in fact, measured. 
For binary regression (victim versus nonvic- 
tim), it is informative to derive the link func- 
tion from a survival model. This is unlikely 
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to imply a logistic regression. The exponen- 
tial survival function may be a good starting 
point due to its simplicity, and it is not consis- 
tent with the logistic link function for a binary 
outcome. A generalization of this approach is 
to model the number of incidents (e.g., num- 
ber of victimizations); the exponential survival 
function generates a poisson distribution for 
these counts. For purposes of the NCVS, “series 
crimes” could then be counted as censored and 
the likelihood equation specified accordingly. 

For some purposes, matching household, per- 
son, and incident information may be done 
using the household ID, person ID, and the 
address interview number. In other instances, it 
is important to substitute the household inter- 
view number for the address interview num- 
ber. The merge operations needed to produce a 
sample like that used in the Lauritsen analyses 
and our replication of it do not depend on cor- 
rect identification of the household interview 
number. The household and person ID variables 
and address interview number are sufficient. 
However, other analyses, like those reported in 
Table 4.7 columns 2 through 8, do depend on 
correct calculation of the household interview 
number. 

The NCVS provides researchers with the 
opportunity for answering numerous research 
questions that cannot be examined using cross- 
sectional data including issues related to repeat 
victimization, which are paramount to both the- 
ory building and crime control policies. When 
using these data, it is important to recognize 
that the sampled units in the NCVS consist of 
residential addresses, not households or indi- 
viduals. When a household moves out of a sam- 
ple address and a new household moves in, 
the new household replaces the old household. 
An important conceptual distinction illumi- 
nated in the chapter differentiates between the 
(scheduled) interview number of the address 
and the (scheduled) interview number of the 
household. These coincide for nonreplacement 
households, but the household interview num- 


ber is less than the address interview number 
for replacement households. These and other 
features of the rotating panel design of the 
NCVS must be taken into account, and increase 
the complexity of preparing the data, for longi- 
tudinal analysis. 


Glossary 


ADier_no Address interview number (vari- 
able name) 


base year Year when interview occurred (base 
year) 


“by” variable Variable used to match cases in 
two or more files to be merged 


HHier_no Household interview number (vari- 
able name) 


Panel Subdivision of the NCVS sample rota- 
tion. Each rotation is divided into six panels. 
Interviews of respondents in each panel begin at 
one-month intervals. January or July for panel 1, 
February or August for panel 2.... 


Reference period Time interval six months 
prior to day 1 of the month the interview 
occurred. Respondents are asked to report vic- 
timizations that occurred during the reference 
period 


Rotation Subdivision of each NCVS sample. 
Each sample is divided into six rotations. 
Interviewing of successive rotations begins at 
six-month intervals 


Sample Addresses identified for inclusion in 
the NCVS. A new sample is selected every three 
years. Samples are designated by integer num- 
bers prefixed with the letter J 


Scrambled Control Number (SCN) Numeric 
address identification code, defined within 
sample numbers 


TIS Time in sample, asynonym for ADier_no 


Struts ty NMS HAP TELD RA WOR Survey for use in longitudinal analysis 65 


References 


BJS (2006). Serious violent crime levels declined 
since 1993. Online summary report. (http://www. 
ojp.gov/bjs/glance/cv2.htm), accessed April 17, 
2006. 

Bowers, K.J. and Johnson, S.D. (2005). Domestic bur- 
glary repeats and space-time clusters. European 
Journal of Criminology, 2: 67-92. 

Conaway, M.R. and Lohr, S.L. (1994). A longitudinal 
analysis of factors associated with reporting vio- 
lent crimes to the police. Journal of Quantitative 
Criminology, 10: 23-39. 

Farrell, G., Sousa, W.H., and Weisel, D.L. (2002). The 
time window effect in the measurement of repeat 
victimization: A methodology for its measurement 
and an empirical study. In N. Tilley (ed.), Analysis 
for Crime Prevention. Vol. 14, Crime Prevention 
Studies. Monsey, NY: Criminal Justice Press. 

Farrell, G., Tseloni, A., and Pease, K. (2005). Repeat 
victimization in the ICVS and the NCVS. Crime 
Prevention and Community Safety, 7: 7-18. 

Gabor, T. and Mata, F. (2004). Victimization and 
repeat victimization over the life span: A predic- 
tive study and implications for policy. Interna- 
tional Review of Victimology, 10: 193-221. 

Gallagher, C.A. (2005). Injury recurrence among 
untreated and medically treated victims of 
violence in the USA. Social Science & Medicine, 
60: 627-635. 

Lauritsen, Janet L. (2001). The social ecology of 
violent victimization: Individual and contextual 
effects in the NCVS. Journal of Quantitative Crim- 
inology, 17: 3-32. 

Lauritsen, J.L. and Davis Quinet, K.F. (1995). 
Repeat victimization among adolescents and 


young adults. Journal of Quantitative Criminology, 
11: 143-166. 

Laycock, G. (2001). Hypothesis-based research: 
The repeat victimization story. Criminal Justice, 
1: 59-82. 

Menard, S. (2000). The ‘normality’ of repeat victim- 
ization from adolescence through early adulthood. 
Justice Quarterly, 17: 543-574. 

Outlaw, M.S., Ruback, R.B., and Britt, C. (2002). 
Repeat and multiple victimizations: the role of 
individual and contextual factors. Violence and 
Victims, 17: 187-204. 

Stevens, T.N., Ruggiero, K.J., Kilpatrick, D.G., 
Resnick, H.S., and Saunders, B.E. (2005). 
Variables differentiating singly and multiply 
victimized youth: results from the National 
Survey of Adolescents and implications for 
secondary prevention. Child Maltreatment, 10: 
211-223. 

Tuma, N.B. and Hannan, M.T. (1984). Social Dynam- 
ics. Orlando, FL: Academic press. 

US Dept. of Justice, Bureau of Justice Statistics. 
NATIONAL CRIME VICTIMIZATION SURVEY, 
1992-2004 [Computer file]. Conducted by U.S. 
Dept. of Commerce, Bureau of the Census. Ann 
Arbor, MI: Inter-university Consortium for Politi- 
cal and Social Research [producer and distributor], 
2006. 

Xie, M., Pgarsky, G., Lynch, J.P., and McDowall, D. 
(2006). Prior police contact and subsequent victim 
reporting: results from the NCVS. Justice Quar- 
terly, 23(4): 481-501. 

Ybarra, L.M.R. and Lohr, S.L. (2002). Estimates of 
repeat victimization using the National Crime Vic- 
timization Survey, Journal of Quantitative Crimi- 
nology, 18: 1-21. 


Presented by: https://jafrilibrary.com 


This page intentionally left blank 


Presented by: https://jafrilibrary.com 


| Chapter 5 i 


The Millennium Cohort Study 
and mature national birth cohorts 
in Britain 
Heather E. Joshi 


The national birth cohort studies, pioneered in 
Britain, have been rated of enormous impor- 
tance for both scientific and policy under- 
standings of human behavior. The Millennium 
Cohort Study (MCS) is the latest of this series 
of prospective studies of the life course, from 
1946, 1958 and 1970, and builds upon them. 
It adds to the portfolio of longitudinal data 
available for secondary analysis on the United 
Kingdom, and adds to the possibilities of cross- 
cohort and cross-national comparisons. 


1 Introduction 


Large-scale longitudinal surveys follow indi- 
viduals through time to chart their experience 
of political, social, demographic and economic 
change. Besides such social monitoring, they 
can also investigate hypotheses about the long- 
term causes and consequences of experiences, 
such as disease or educational attainment. 
The expense of collecting and maintaining 
these databases often means that they are used 
to serve many purposes and are utilized for 
secondary analysis beyond the ideas or imagi- 
nation of the originators. The prospects of mul- 
tiple uses, and the difficulty of knowing what 
topics will interest researchers 25 or 50 years 


down the line, argue for a broad coverage; on 
the other hand, to study well specified hypothe- 
ses in depth, points towards selectivity, given 
the twin constraints on the cost of data col- 
lection and respondent burden. The Millen- 
nium Cohort Study is designed from the outset 
(unlike its forerunners) to be a multipurpose, 
longitudinal research resource, with scope for 
analysis that should interest readers of many 
disciplines. 

Section 2 outlines some features of the three 
national birth cohort studies started in Britain 
before 2000 and still ongoing, which formed 
a template for the study of the Child of the 
Century (as MCS is known in the field). Other 
major longitudinal data resources such as the 
ONS Longitudinal Study, the British House- 
hold Panel Study and the Avon Longitudinal 
Study of Parents and Children are left outside 
its scope. Section 2 introduces the 1946 birth 
cohort, followed by the more closely entwined 
histories of the 1958 and 1970 cohort studies, 
which are more widely available for analysis. 
The UK Millennium Cohort Study (MCS) is 
described in greater detail in Section 3, cover- 
ing its establishment, design and content from 
2000 to 2008. Section 4 discusses the analysis 
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Table 5.1 British national birth cohort studies: origins and access 


MBC National Survey National Child British Birth Millennium 
of Health and Development Cohort Study Cohort Study 
Development (NSHD) Study (NCDS) (BCS70) (MCS) 
Birth year 1946 1958 1970 2000-1 
Initial James Douglas, Neville Butler, Geoffrey and Roma Heather Joshi, 
principal 1946-79 1958-69 Chamberlain, 2000— 
investiga- Mia Kellmer 1970-72 
tors Pringle, 1965-79 Neville Butler, 
1973-91 
Successors John Colley, Ronald Davie, John Bynner, 
1979-84 1979-85 John Fox, 1991-2004 
Michael 1985-89 
Wadsworth, John Bynner, 
1984—2006 1989-2004 
Jane Elliott, 2004— 
Initial Survey of Perinatal mortality | Perinatal Multipurpose, 
purpose maternity services survey conditions with multidisciplinary, 
follow-up intended | longitudinal study 
Initial Population National Birthday National Birthday Economic and 
sponsor Investigation Trust, Royal Trust, Royal Social Research 
Committee and College of College of Council, UK 
Royal Commission Obstetricians and Obstetricians and government 
on Population, Gynaecologists Gynaecologists departments 
Nuffield 
Foundation, 
National Birthday 
Trust Fund 
Current Medical Research ESRC (+MRCG*), ESRC (+NRDC and_ | As above 
funding Council for core US NICHD for ESF in 2004) 
funding, grants second generation 
from other sources in 1991, etc. 
Current Health over the life Multipurpose Multipurpose Multipurpose 
purpose course, ageing and studies of the life studies of the life studies of the life 
its precursors course, assessment | course course 
of biomedical 
outcomes and risk 
factors 
Availability | May be analyzed Anonymized datasets available from the UK Data Archive 
of data to by collaborators of Biomedical, geographical and other sensitive or disclosive 
other the study team. data on special conditions 
researchers Access under 


review 


* 1958 Cohort Biomedical Study, 2002-7 Christine Power, David Strachan, Bynner/Joshi, Gillian Prior, extended on Wellcome 


Trust funding to establish a genetic data resource for medical research. 


Table 5.2 British national birth cohort studies: research design 
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MRC National 
Survey of Health 
and Development 
(NSHD) 


National Child 
Development Study 
(NCDS) 


British Birth Cohort 
Study (BCS70) 


Millennium Cohort 
Study (MCS) 


Criterion Born in a week of Born in a week of Born in a week of Born over a year* 

March 1946 March 1958 April 1970 and living in a 
sampled ward at 9 
months 

Geographical | England, Wales and | Great Britain UK, but only GB UK (includes 

coverage Scotland (Great followed up Northern Ireland) 
Britain) 

Size of 5362 17,414 16,571 18,818 

initial Legitimate Data collected from 17,052 live births Selected from 

sample singletons selected 17,733 births in the in GB. 27,201 children 
from data on 13,687 | week. Immigrants with with relevant Child 
out of 16,695 births | Immigrants with sample birth dates Benefit address and 
in the week sample birth dates added up to age 16 birth months 

added up to age 16 

Weighting Follow-up of None None Over-representation 
one-in-four of wards with high 
children of wives child poverty, in 
of manual workers Celtic countries, 
and all births and, in 
to wives of England, with 
non-manual and minority ethnic 
agricultural concentration 
workers 

Frequency 20 core follow-ups 7 core follow-ups 6 core follow-ups Every two years in 


of follow-up 


in 53 years: 
Approximately 
every two years 
until age 26, then 
31, 36, 43, 53, 60 


in 47 years: 
7,11, 16, 23, 33, 
42, 46 


in 34 years: 5, 10, 
16, 26, 30, 34 


Now planned for 4-year intervals, alternat- 
ing face to face and telephone, unless fur- 
ther funds become available. 


childhood 
3, 5, 7 planned 


*MCS sample birthdates between September 2000 and August 2001 in England and Wales and 24/11/00-11/01/02 in Scotland 
and Northern Ireland 
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Table 5.3. British national birth cohort studies: data collection and cohort maintenance 


MRC National National Child British Birth Cohort | Millennium Cohort 
Survey of Health Development Study | Study (BCS70) Study (MCS) 
and Development (NCDS) 
(NSHD) 
Sample size | 3322 (36, 1982) 11,407 (33, 1991) 11,261 (30, 2000) NA 
at 36, 33 or 
30 
% of eligible | 75.5 71.6 71.5 72 ( initial sample) 
Date of 1999 2004-5 2004-5 2003-5 
latest 
contact 
Achieved 3035 9530 9665 15,600 
sample at 
latest 
contact 
Informants Mothers, cohort Mothers, cohort Mothers, cohort Mothers, fathers 
child/member, child/member, child/member, cohort child, 
teachers, teachers teachers teachers, health 
school medical School medical School medical visitors 
officers, school officer Partners officer 
nurses, youth at 33 
employment 
officers. Second 
generation: cohort 
members or 
partners when first 
born aged 4 years 
and 8 years 
All women in 
cohort age 47-54 
Second generation study on subsample 
at 33/34 : data from cohort’s children 
and their mothers 
Mode Interviews, Interviews, Interviews, CAPI and CASI 
initially by health initially by health initially by health interviews from 
visitor, postal visitor, paper visitor, paper outset 
questionnaires, self-completions, self-completions, 


nurse interviewer 
visits CAPI in 1999 
Clinic data 
collection 
feasibility study 
currently taking 
place 


nurse visit at 42 
(CAPI/CASI since 
2000) 

Alternate contacts 
by telephone 
starting 2004 


postal 
questionnaire at 26, 
CAPI/CASI 2000 
Alternate contacts 
by telephone 
starting 2008 


FPS TRY: MPE CHAT Stang cona mature national birth cohorts in Britain 71 


Table 5.3 (Continued) 


MRC National National Child British Birth Cohort | Millennium Cohort 
Survey of Health Development Study | Study (BCS70) Study (MCS) 
and Development (NCDS) 
(NSHD) 

Tracing Team of tracers consult various registers, e.g. electoral roll, As other studies 
phone directory, vehicle and driving licence authority, National but also includes 
Health Service Central register will forward enquiries to GP updates on 
Interviewers attempt to trace movers in field. Stable addresses addresses from 
collected at interview, and followed up on phone by tracers Department for 
if needed. Cohort members invited to keep in touch. Week of Work and Pensions 
birth a key identifier. 

Feedback Annual birthday card with feedback information and invitation Feedback 


to update address 


documents sent in 
batches around 
times of birthdays 


Cohort member website 


potential of the latest study. To complete this 
introduction, Tables 5.1, 5.2 and 5.3 present a 
cross-cutting synopsis of the history, data acces- 
sibility, design and data collection methods of 
the four national birth cohort studies. 


2 The heritage of birth cohort 
studies in Britain 


2.1 The MRC National Survey of Health 
and Development: 1946 Cohort 


As shown in Table 5.1, the first national survey 
of births took place in 1946 addressing the state 
of maternity services on the eve of the intro- 
duction of the National Health Service (Joint 
Committee, 1948). Because of the urgency of 
the situation and cost constraints, the survey 
took all the births in one week (in March) and 
data collected by health visitors in the home a 
few weeks after delivery. The maternity study 
covered all the births in Great Britain where 
the authorities cooperated (13,687 out of the 
possible total of 16,695, Table 5.2). It made rec- 
ommendations about maternity care published 


in the Joint Committee’s report (1948). The 
Director, Dr James Douglas, initiated a follow- 
up in 1948, of a weighted subsample of 5632 
cases from the original 13,687. Unforeseen at 
the time, this continues into the 21st century. 
There have been 20 follow-ups across child- 
hood and adulthood up to 2000 (see Tables 5.2 
and 5.3 and Wadsworth et al., 2003). A further 
sweep is being prepared for age 60 in 2006. 
There were approximately 3600 cohort mem- 
bers remaining in the study at the 1999 inter- 
view. The 1946 study has pioneered methods of 
keeping in touch with cohort members, includ- 
ing sending them a birthday card. 

The study is known as the MRC National 
Survey of Health and Development to reflect 
its major funding from the Medical Research 
Council, which since 1962 has permitted the 
continuation of the study over the cohort’s adult 
years, directed since 1984 by Professor Michael 
Wadsworth at University College, London. This 
funding of data collection and analysis together 
follows a model customary for medical research 
in Britain. The dataset is not conceived as a 
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general data resource and there are features of 
the consent given by informants which may 
limit its uses to specified purposes. Researchers 
outside the MRC Unit at University College, 
London, have been able to work on the study 
via collaborative arrangements, and wider data 
access is under review. 

Classic findings from the childhood years 
are reported in books by Douglas (1964) and 
Douglas et al. (1968). Further findings have been 
synthesized by Wadsworth (1991) and, for exam- 
ple, presented in collections of papers on life 
course epidemiology and women’s health (Kuh 
and Ben Schlomo (eds), 2004, and Kuh and 
Hardy (eds), 2002). Further information about 
the study and publications arising from it can 
be found on its website (www.nshd.mrc.ac.uk). 
The volume edited by Ferri, Bynner and 
Wadsworth (2003) brings together findings on 
the 1946 cohort up to age 43, with similar 
results up to age 42 and 30 from the 1958 
and 1970 birth cohorts, respectively. It also 
summarizes the methodology ofall three studies. 


2.2 The National Child Development Study: 
1958 Birth Cohort 


The 1958 cohort study also features in Tables 
5.1, 5.2 and 5.3. The birth study was again about 
perinatal conditions (Ferri, 1998). Like the 1946 
exercise it took all the births in a week of March 
as the target and health visitors collected the 
data. There was also no initial plan to estab- 
lish a longitudinal study (despite the exam- 
ple of eight follow-ups of the 1946 children by 
then). That opportunity came in 1965 when the 
cohort sample was revived to provide evidence 
for the Plowden Enquiry on primary education, 
which reported in 1967. This time there was 
no subsampling, indeed the cohort was aug- 
mented. In 1965, 15,425 seven-year-olds were 
found, of whom 15,051 had been in the ini- 
tial sample of 17,415 and 374 were subsequent 
immigrants, born in the survey week, identified 
by their birth dates in school records (Plewis 
et al., 2004). 


The survey became known as the National 
Child Development Study (NCDS) and was 
based under the joint direction of its founder, 
Neville Butler (by then Professor of Child 
Health at Bristol University), and Mia Kellmer 
Pringle, at the National Children’s Bureau from 
1965. Under the Bureau’s auspices, and with 
various ad hoc sources of funding, there were 
three childhood follow-ups: NCDS1 at age 7, 
NCCDS2 at age 11, and NCDS3 at age 16 (Davie 
et al., 1972, Fogelman, 1983). Each time immi- 
grants were recruited. Information was col- 
lected from mothers, teachers, school medical 
services and, at age 16, from the cohort mem- 
ber as well. Examination results were added. 
The data were made available to the research 
community via the ESRC Data Archive from 
1983. The National Children’s Bureau oversaw 
the collection of the first survey of the cohort 
as adults, NCDS4 at age 23. This round of data 
collection was largely funded by government 
departments but there was no commitment to 
longer term continuation. 

The size of the sample from which some 
information was collected was around 14,000 
at age 16 and somewhat over 12,000 at age 23, 
representing a longitudinal response rate out 
of the original cohort of 87% and 76% respec- 
tively (Plewis et al., 2004). The proportions of 
the original cohort with complete information 
are somewhat lower, depending on which vari- 
ables are of interest. 

In 1985 the National Child Development 
Study (as it is still known), transferred to the 
Social Statistics Research Unit at City Univer- 
sity, directed by Professor John Fox, who was 
succeeded in 1989 by John Bynner. In 1998, 
Professor Bynner took the unit, and the two 
cohort studies for which it was by then respon- 
sible, to the Institute of Education, in the Uni- 
versity of London. The team still operates there, 
renamed as the Centre for Longitudinal Stud- 
ies (CLS). 

Back in the 1980s John Fox faced the 
challenge of raising funds for a fifth sweep 
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(NCDS5). The Economic and Social Research 
Council (ESRC) joined government departments 
to finance a survey in 1991 of cohort mem- 
bers at age 33 and their partners. Its design, 
as with future CLS surveys, was developed 
in consultation with the scientific community. 
It also includes a survey of the children of 
one-third of the cohort (and their mothers). 
This second generation study was financed by 
the US National Institute for Child Health and 
Development (NICHD) and contained develop- 
mental measures comparable with the second 
generation module of the National Longitudinal 
Survey of Youth (NLSY) in the US. A public use 
data set of 11,407 cases with at least some infor- 
mation (10,986 of whom were also in the birth 
survey) was made available through the Data 
Archive, and its preliminary findings outlined 
in a ‘sourcebook’ (Ferri (ed.), 1993). 


2.3 The British Birth Cohort Study of 
1970(BCS70) 


During the 1990s the development of the 1958 
cohort became increasingly integrated with that 
of the 1970 cohort study, so this is a good point 
to bring in the next study featured in Tables 
5.1 to 5.3, and rewind history to 1970, when 
the third nationwide birth cohort study was 
launched of all births in a week of April 1970, 
under similar auspices as in 1958 (Ferri, 1998). 
Health visitors, again, collected data on 16,571 
births (out of a possible 17,287 in Great Britain, 
Plewis et al., 2004). The birth survey also 
included, for the first time, some 600 births in 
Northern Ireland, but they were not followed 
up in the deteriorating political situation. This 
time there was always an intention to follow up 
and subsamples were followed up at 22 and 42 
months. The Child Health and Education Study 
(CHES) was established to pursue a broad front 
of social and educational issues, as well as 
medical, in the Department of Child Health, 
University of Bristol, under the direction of 
Neville Butler. There was a full survey at age 5 
(Osborn et al., 1984) and another CHES survey 


at 10. The age 16 follow-up was conducted by 
a charity, the International Centre for Child 
Studies (ICCS), with funding from a large num- 
ber of sources, thanks to Neville Butler’s dual 
talents as a scientist and fundraiser. Known 
as Youthscan, this survey collected a great 
variety of data from the young people, their 
parents and schools, some of which remained 
inaccessible for a number of years due to a lack 
of resources for data management. 

In 1991 the unit at City University took 
responsibility for the BCS70 (as it became 
known) and deposit of anonymised data for 
public use in the ESRC archive (Ekinsmyth 
et al., 1992). A one-in-ten sample was surveyed 
at age 21 on literacy and numeracy (and com- 
plemented by a similar 10% survey of the 1958 
cohort at age 37). In 1996 the opportunity of 
ESRC funding arose at short notice to approach 
the whole 1970 cohort, at age 26, with a postal 
questionnaire about their lives since leaving 
school. Given the short time available to trace 
addresses and follow up nonresponders, the 
9006 returns were not only interesting in their 
own right (Bynner et al., 1997) but encouraged 
the view that this cohort still had potential as a 
longitudinal data resource. 


2.4 NCDS and BCS70 under 
shared direction 


Meanwhile several reviews of longitudinal data 
endorsed the unique strengths of the accumu- 
lating resource and recognized that uncertain- 
ties surrounding future funding had not helped 
rational planning of good data quality. In 1998 
the ESRC established a national strategy for lon- 
gitudinal data. It was agreed that the 1958 and 
1970 cohorts should follow a “forward plan” 
of a sweep every four years, starting with a 
very similar interview survey of both of them 
in 2000. Thereafter, alternative data collection, 
starting with NCDS in 2004, would be through 
the cheaper telephone mode, unless additional 
funding is raised. In 2000, the Centre for Lon- 
gitudinal Studies won the contract for the first 
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survey of the Millennium Cohort study (see 
fourth column of Tables 5.1, 5.2 and 5.3 and 
below). In 2004, the ESRC consolidated its 
funding of the collection and development of 
the three cohort studies of 1958, 1970 and 2000, 
up to 2010, making CLS an ESRC Resource 
Centre. A governance structure was set up to 
combine the advice of scientific experts with a 
steering group representing the major funders 
and reporting to higher level bodies, including 
the ESRC committee for National Strategy on 
Longitudinal Data. 

The “2000” interview surveys of both the 
1970 and 1958 cohort (which actually started 
in December 1999) used a more or less common 
interview schedule designed also to permit 
comparisons of both cohorts between the ages 
of 30 and 42 respectively, and with the data 
gathered on the 1958 cohort at 33 (Bynner et al., 
2000). The achieved sample sizes, each well 
over 11,000, meant that BCS70 had recovered 
cases since the postal sweep and that there had 
been relatively little net attrition in NCDS. An 
overview of the results, also making compar- 
isons with the 1946 cohort, can be found in 
Ferri et al. (ed.) (2003), as mentioned above. 

The next major data collection from the 
1958 cohort was in 2003-4 around age 45, 
when biomedical data and specimens were 
collected by research nurses. Measurements 
include blood pressure, lung function, blood 
and saliva samples, hearing and vision tests, 
height, weight and psychological indicators. 
This study was initiated by Professors Christine 
Power and David Strachan, and funded by the 
Medical Research Council, to assess biomedi- 
cal outcomes and risk factors. The project has a 
team of specialist collaborators, medical scien- 
tists who, on the MRC model, are funded to 
analyze the results in the first instance. There 
is intended to be some form of wider access 
to the data for other analysts, probably after 
2007. The biomedical project has also estab- 
lished a collection of genetic evidence derived 
from immortalized cell lines, which have been 


generated under funding from the Wellcome 
Trust. Access to this material, for purposes of 
medical research, is controlled by the MRC WT 
Oversight Committee. 

In 2004—5 there was another interview sur- 
vey of the 1970 cohort at age 34, enhanced 
by a survey of the children of one cohort 
member in two. The mother and child survey 
was financed by funds raised by the National 
Research and Development Centre on Adult 
Literacy and Numeracy (NRDC), largely from 
the European Social Fund. NRDC also sup- 
plemented the resources available from ESRC 
for the survey of BCS70 adults at 34 to per- 
mit a module on dyslexia. The second genera- 
tion study administered cognitive assessments 
to children old enough to attempt them, like 
the NCDS second generation study at 33. The 
number of children surveyed, just over 5000, 
reached expectations but overall response from 
adults in the age 34 survey fell below 10,000 
(74% of the restricted target issue for field- 
work) reflecting difficulties in finding movers’ 
addresses. 

There was, at about the same time in 2005, 
a telephone survey of the 1958 cohort lasting 
about 30 minutes and therefore collecting less 
than their previous 90-minute interview, which 
showed a similar level of response, n = 9534, 
to the survey of BCS70 that year (9665, see 
Table 5.3), but a higher proportional success 
rate (81%) in reaching addresses that were actu- 
ally issued for fieldwork. 

In 2004, after John Bynner’s retirement, 
Dr Jane Elliott took over as the Research Direc- 
tor of the NCDS and BCS70. In 2006 she was 
presiding over the preparation of material from 
the 2004-5 studies for deposit in the UK Data 
Archive and preparing for the next round of 
surveys on these adult cohorts in 2008. Given 
the growing mass of information collected on 
these the cohort members’ lifetimes (well over 
13,000 variables) consideration is being given 
to improving the disclosure control associated 
with data access. 
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3 The ESRC Millennium 
Cohort Study 


3.1 Gestation and metamorphosis of the 
first survey 


The history of the Millennium Cohort Study 
starts with the nonoccurrence in 1982 of any 
fourth in a twelve-year sequence of national 
birth cohort studies. The political and economic 
climate then did not favor large investments in 
social data collection. It was to be another 16 
years before the ESRC adopted its longitudinal 
data strategy. Then the government decided it 
would include a new cohort study among the 
activities to celebrate the turn of the Millen- 
nium. The ESRC, which was to be the main fun- 
der, drew up a specification for the study. It was 
to have more social and economic content and 
less of a medical focus than the initial surveys 
of its predecessors. Data were to be collected 
by computer-assisted means by professional 
interviewers, rather than by imposing on the 
goodwill of health visitors, unlikely still to be 
possible. The relative scarcity of trained inter- 
viewers told against attempting to recruit all the 
births of one week again. Another consideration 
in favor of spreading births throughout the year 
was that it would reveal any variation by sea- 
son of birth, excluded by the earlier design. It 
was stipulated that at least some of the cohort 
should be born in 2000 and that the first sur- 
vey should be within the first year of life, as 
close to age 6 months as possible. The invita- 
tion to tender for the principal investigator (PD) 
role was not published until late February 2000, 
with unusually short notice to submit. In May 
2000, it was announced that the Centre for Lon- 
gitudinal Studies had won the PI contract, with 
the author as MCS Director. But already nearly 
half-way through the Millennial year, it was 
still necessary to specify the tender for field- 
work and consider bids for that before work 
could start in earnest. The National Centre for 
Social Research (NatCen) was appointed as the 
fieldwork contractor in October 2000. 


By that time there was little choice but to start 
fieldwork in 2001. The cohort births were fixed 
in the 12 months from September 1, 2000, and 
the interview age at 9 months, i.e., to start field- 
work in June 2001. Even so, the survey develop- 
ment phase, with two pilot surveys, proceeded 
at a breakneck speed, maintaining the tyrannical 
pace from which the study never seems to escape. 

Two major features of the survey design were 
established during the earlier months of the 
PI’s work. The first was the sampling scheme 
devised by Ian Plewis, which simultaneously 
permitted the over-representation of certain 
groups of particular scientific or policy interest, 
and provided a structure making it possible to 
analyze neighborhood and community on child 
development and family wellbeing (Plewis, 
2004). The subpopulations of interest were chil- 
dren in poor families, children from minority 
ethnic groups, and, especially given supple- 
mentary funding from the devolved adminis- 
trations of these countries, the inhabitants of 
Scotland, Wales, and Northern Ireland. The 
geographical unit in which these populations 
were identified is the electoral ward, an admin- 
istrative entity relevant to elections (but not 
any other particular service provision) averag- 
ing around 5500 inhabitants. There are 11,000 
of these in the UK as a whole. There were 
data available on families receiving low-income 
benefits by ward in 1998, which were used to 
split each county into two strata, disadvantaged 
and nondisadvantaged (sometime rather mis- 
leadingly labeled “Advantaged”since this com- 
prises all but the most disadvantaged). The 
cut-off point for the child poverty index was 
that the ward had more that 38.4% of the chil- 
dren in a 1998 database living in “poor” fam- 
ilies. This cut-off (bottom quartile of wards in 
England and Wales) accounted for nearly half 
the wards in the smaller countries of Wales, 
Scotland, and Northern Ireland, but less than 
one quarter in England. In England, where most 
of the minority ethnic groups live, there is 
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a third stratum, actually identified before the 
other two, of wards expected to contain high 
concentrations of ethnic minority population. 
The criterion had to be based on evidence 
from the 1991 census, which indicated places 
redrawn to 1998 boundaries, with more than 
30% of the population of Black or Asian ethnic 
identity in 1991 (where the median for these 
places was 51%). With two strata in each of 
the Celtic countries and three in England, there 
are nine strata, within which wards are selected 
with different known probabilities, offering the 
greatest over-representation to the ethnic stra- 
tum in England, followed by the disadvantaged 
stratum in Wales. 398 wards were selected and 
within them all births on specified birth dates 
(see below “Qualifying Dates of Birth”) were 
eligible for the survey, provided the child was 
resident in the ward at age 9 months. 

The second decision involved sampling 
frames and the way in which respondents could 
be approached. The first sampling frame was 
the list of wards on boundaries that could be 
matched to child poverty and ethnic data, but 
at the second should locate the children in 
selected wards who had the selected birthdays. 
One possible source of names and addresses 
and dates of birth was the registration of 
births. This avenue was abandoned because the 
recruitment procedures involved parents pro- 
viding a written opt-in consent before their 
names could be released to the survey. There 
would not have been time to follow-up ini- 
tial nonresponders and there was concern 
that an opt-in would bias the sample away 
from people with poor language or literacy 
skills. 

An alternative sampling frame was offered 
by the then Department of Social Security, 
which was joining the consortium of govern- 
ment departments offering major supplementa- 
tion to the funding of the survey. This permitted 
access to the Child Benefit Register. Child Bene- 
fit is almost universally claimed and its records 
contain both a date of birth and an address 


likely to be reasonably up to date. Furthermore, 
the Department operated an opt-out rather than 
an opt-in approach when using the records to 
recruit research samples. The Department sent 
a letter to parents with an eligible address say- 
ing that the Department would pass their name 
and address on to the survey, unless the par- 
ent provided written or telephoned notice that 
they would prefer to opt out. In the event, 7% 
opted out, a further 3% of cases were held 
back as potentially sensitive or already issued 
to another survey, which combined with 13% 
refusals in the field compares favorably with 
the 30% or so pre-fieldwork loss feared by the 
opt-in route. The opt-out route raised ques- 
tions when the plan was presented to the NHS 
Research Ethics Committee, who felt it might be 
coercive. It was agreed that interviewers should 
establish that the families really were giving 
informed consent at an introductory visit before 
proceeding. 

The confirmation of government funding to 
supplement the resources originally committed 
by the ESRC to the Millennium Cohort dur- 
ing its design phase increased the scope of the 
study in several ways. One was to increase 
the target achieved sample from 15,000 origi- 
nally specified to over 20,000. Funding from the 
National Assembly of Wales doubled the target 
number of cases in that country to 3000. There 
were also boosts of 1000 and 500 in Scotland 
and Northern Ireland respectively, funded by 
the devolved governments of those countries. 
The sample of disadvantaged wards in England 
was boosted to provide more control cases 
for the National Evaluation of Sure Start in 
England, an integrated program of services for 
young children. Government funding also paid 
for an extension of the length of contact time 
with families by 15 minutes in the first sur- 
vey, to 105 minutes overall. The government 
funding permitted several enhancements to the 
interview data, notably the linkage of exter- 
nal data (from hospital episode statistics, birth 
registration) for those (the vast majority) who 


FPS ATEN: MP ECA Stang cone mature national birth cohorts in Britain 77 


gave permission and could be matched. Other 
enhancements include: a postal substudy of 
mothers whose cohort child was born through 
assisted fertility; another postal survey of health 
visitors about the services available to families 
with young children in the study wards; and the 
assembly of other ecological indicators about 
those places. Crucially, and unlike the ESRC 
award, the government funding provided funds 
for in-house reporting and analyzing the study 
up until 2005, during which period it also ear- 
marked funds to contribute to the collection of 
data for the second survey. 


3.2 Qualifying dates of birth 


Before the story of the Millennium Cohort can 
proceed it is necessary to set out the dates 
of birth, which qualify a child for member- 
ship of the cohort. For families resident in 
England and Wales, they run from Septem- 
ber 1, 2000 to August 31, 2001, which ful- 
fills the requirement of being spread over a 
year, coincidentally the dates normally sep- 
arating the age cohorts passing through the 
school system. The survey could not start with 
September births in Scotland and Northern Ire- 
land, because of an existing government sur- 
vey drawing respondents from those families. 
It was decided to avoid respondent overload 
by postponing the start of MCS fieldwork. The 
cohort started in these countries with births 
from November 24, 2000, the start of the fourth 
batch of 4-week birth cohorts who formed 
the waves in which the sample was issued 
(Shepherd et al., 2004). The closing date for 
the birth dates for the cohort in these coun- 
tries should have been November 23, 2001, but 
the sample of birth dates was extended, by 
an extra six weeks to January 11, 2002. This 
decision was taken in mid-fieldwork to com- 
pensate for the shortfall of cases being issued. 
This resulted from the actual number of births 
falling below the number expected when the 
sample was designed. In fact, 2001 turned out 
to be an all-time low for British fertility. By the 


time the “birth dearth” became evident it was 
not possible to boost the sample by selecting 
more wards; the only option was to select 
more birth dates. The sponsors in England and 
Wales were content to live with the short- 
fall in numbers, but the samples in Scotland 
and Northern Ireland were anyway smaller, so 
the option of extending birth dates was taken 
up. The rules for school entry and dates of 
school years in Scotland and Northern Ireland 
are not the same as in England and Wales, so 
the feature that members of the cohort do not 
belong to a single school year is magnified, but 
not created, by these variations in the cohort 
birth dates. 


3.3. Data collection over time 


With its confirmation in late 2001 of funding for 
the second survey at age 3, ESRC ensured that 
the study would, as intended, become longitu- 
dinal. The institution of CLS as a Resource Cen- 
tre in 2004, mentioned above, provided funds 
for the third and fourth surveys at age 5 and 
age 7. The fieldwork for these two surveys 
mostly occurs in 2006 and 2008 respectively. 
Any further follow-up, which might preserve 
the two-year rhythm, with a survey at age 9, 
or mimic the 10-year-old survey of BCS70 (in 
2011) or the 11-year-old survey of NCDS (2012), 
has yet to be decided, but there is, in principle, 
the intention to follow the cohort further into 
adulthood. 

The fact that the sample birth dates extend 
over more than 16 months has implications for 
the duration of fieldwork in each survey, sub- 
sequent data release, and the frequency with 
which they can be repeated. In the first survey, 
interviews mostly took place as intended, very 
close to the time the child was aged 9 months, 
but this still meant that fieldwork spread from 
June 2001 to January 2003. By that time prepa- 
rations for Sweep 2 were well in hand. Designed 
to reach families when the child was between 
36 and 39 months of age, fieldwork should have 
run from September 2003 to the end of February 
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2005, but the finish slipped to the beginning of 
May 2005. Fieldwork periods in the next two 
sweeps, age 5 and age 7, will be less spread out 
in England and Wales, by dint of interviewing 
at a greater spread of child ages, but in Scotland 
and Northern Ireland the split of the sample 
over two school years will extend the collection 
and complicate the release of data. A prelim- 
inary version of the first survey (MCS1) was 
released to the UK Data Archive in mid 2003. 
Another version was released in the spring of 
2006 with comparable data from Sweep 2, with 
variable names reissued to facilitate longitudi- 
nal analysis. 


3.4 Content 


The broad topics covered by the first four sur- 
veys of MCS are summarized in Table 5.4. 
MCS1 conducted interviews with each resi- 
dent parent, taking many questions from those 
recently put to members of the 1958 and 1970 


cohorts. The main informant was almost invari- 
ably the natural mother, supplemented by the 
main informant’s partner, usually the natural 
father if he was present and willing. Each parent 
also had a self-completion instrument contain- 
ing some more sensitive material. The whole 
encounter took 105 minutes on average, about 
75 minutes with the mother and 30 with the 
father. She was asked about a number of topics 
including pregnancy and delivery, which were 
not repeated for the father. This is a long time 
for an interviewer visit (or visits), taking even 
longer when questions had to be translated, but 
on the whole parents and interviewers seemed 
to enjoy talking about the baby. 

The age 3 survey repeated the parental inter- 
views with similar but not identical content, 
over a somewhat shorter total span of time, 
because this was the first survey of the cohort 
to take direct measurements from the cohort 
children themselves. These consisted of anthro- 
pometry and cognitive assessments (details in 


Table 5.4 UK Millennium Cohort Study: content of first four surveys 


Respondent Mode Modules and subtopics Surveys, by age of cohort 
Main/Partner Interview Household 9m,3,5,7 
(inc. ethnicity and language) 
Mother/main Father/Partner 
Parents Interview Non-resident parents 9m,3,5,7 
Pregnancy, labour and delivery 9m 
Father’s involvement with child 9m,3,5,7 
Child’s health and development 9m,3,5,7 
Childcare 9m,3,5,7 
Early education 3,5 
School 5,7 5,7 
Grandparents and friends 9m,3,5,7 9m,3,5,7 
Parent’s health 9m,3,5,7 9m,3,5,7 
Employment 9m,3,5,7 9m,3,5,7 
Family income 9m,3,5,7 
Parental education/skills 9m,3,5,7 9m,3,5,7 
Housing and local area 9m,3,5,7 
Interests and time with child 9m,3,5,7 9m,3,5,7 
Older siblings 3,5 
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Table 5.4 (Continued) 


Respondent | Mode 


Modules and subtopics 


Surveys, by age of cohort 


Self-completion Child’s temperament and 9m,3,5,7 
behavior (inc. SDQ from 3) 

Relationship with partner 9m,3,5,7.  9m,3,5,7 
Previous relationships 9m,3,5,7. 9m,3,5,7 
Domestic tasks 9m,3,5,7 
Parenting 9m,3,5,7 9m,3,5,7 
Previous pregnancies 9m 
Children living elsewhere 9m,3,5,7 
Mental health 9m,3,5,7. 9m,3,5,7 
Drug use, domestic violence 3,5,7 3,5,7 
Alcohol problems 9m,3,7? 9m,3,7? 
Attitudes 9m,3,5,7. 9m,3,5,7 


Child Cognitive assessments Bracken Basic Concept Scale 3 
BAS naming vocabulary 
BAS picture similarities coe 
BAS pattern construction Dinas 
Sally and Anne 5 
Anthropometry Height and weight SD peds 
Waist circumference 5.8 
Biological sample Immunology of oral fluid 3 
Older sibling Self-completion (England only) 3,5 
Interviewer Observations Home environment 3 Dt ie 
Neighborhood 3 
Notes 


9m = 9months. Other ages in years 


The modules are not all repeated in their entirety across sweeps. 

Some of the original questions are only put in subsequent sweeps to new informants. 
The exact composition of child assessment at Sweep 4 is not yet decided. 

BAS is British Ability Scales, ‘Sally and Anne’ is a Theory of Mind test. 


Table 5.4), which the field force was spe- 
cially briefed to administer. Similar data col- 
lection has also been successfully carried out 
by survey interviewers in the second genera- 
tion studies of NCDS and BCS70. Another inno- 
vation at age 3 was collection of data about 
older siblings, and in England from some of 
these children themselves, if aged 10-15, in a 
self-completion questionnaire. Broadly similar 
assessments continue in the age 5 survey and 
there are plans for age 7, with age-appropriate 
changes. Questions about older siblings were 
repeated, probably for the last time at 5, but not 


the self-completion. From age 5, anthropome- 
try includes the measurement of the waist as 
well as height and weight, in the belief that 
5-year-olds might be more cooperative than 3- 
year-olds at yielding this key information for 
the study of obesity. At age 3 the children also 
provided a sample of oral fluid, for an investiga- 
tion of immunities to test the hygiene hypoth- 
esis about allergy and asthma, strictly not for 
any other purpose. There is no further speci- 
men collection in the surveys at age 5 and age 
7, but a biomedical follow-up at a later age is 
under consideration. The final elements of the 
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age 3 survey (of which vestiges only remain at 
age 5) were two observations made by the inter- 
viewers: conditions inside the home and in the 
neighborhood outside. 

Although some broad themes in the parental 
questionnaire are shown in Table 5.4 as con- 
tinuing threads across each survey, the actual 
questions may not be identical or as many at 
each sweep. Some information is fed forward 
to ensure continuity. New informants, mostly 
new partners, are given extra questions to fill in 
essential data, which is not otherwise repeated. 
Details of the instrumentation for datasets in the 
public arena are available on the CLS website 
and from the UK Data Archive. 


3.5 Response and cohort maintenance 


While any survey is judged by its success at 
achieving a high and unbiased response rate, 
the issue of achieving and maintaining response 
is of special concern to the gatherers and ana- 
lysts of longitudinal data. As indicated in Table 
5.3, the mature cohorts had maintained, at least 
until 2000, response rates into adulthood of 
over 70% of the eligible population (i.e. exclud- 
ing those who are known (or thought) to have 
died or emigrated). For details, readers are 
referred to Wadsworth et al. (2003) and Plewis 
et al. (2004). The 1946 response rate is based on 
the 5632 follow-up sample and does not allow 
for initial loss when the birth survey was done. 
For the MCS this item alone brings the response 
rate at the first survey also to the 72% mark 
(Plewis, 2004). The success of the fieldwork 
operation is more often judged by response out 
of the sample actually issued. For the first MCS 
survey this was 81% (like NCDS in 2004-5), 
although the unissued eligible cases for MCS 
were not found or withheld by the DSS, and the 
unissued eligible cases for the mature cohorts 
consist of permanent refusals and those thought 
to be beyond hope of tracing (given avail- 
able resources). CLS issues a technical report 
on sampling with each deposit of data to the 
archive, estimating and analyzing nonresponse 


rates, discussing item nonresponse, potential 
biases, and the possibility of weighting to cor- 
rect for nonresponse bias. These reports are not 
yet available for the most recent surveys, where 
there has clearly been substantial attrition, but 
also some recuperation of informants who had 
previously not responded, including a group of 
“New families”, missed by the DWP at the first 
MCS survey but interviewed in the second. 

The technical report on sampling, which 
accompanied the deposit of the first Millen- 
nium Cohort survey (Plewis, 2004) deals with 
response and potential bias in the drawing 
of the original sample (noteworthy is the low 
response from the ethnic minority and Northern 
Irish wards). The analysis of the 3500 cases lost 
to the survey between the first and second sur- 
veys, is not ready at the time of writing, nor is 
the analysis of the disappointingly high losses 
(about 1500 cases per cohort) in the 2004—5 
rounds of the 1958 and 1970 cohorts. This has 
not been for want of efforts to trace (see Table 
5.3). It may be that there has been an increase 
in residential mobility and/or a diminution of 
the willingness of providers of address informa- 
tion to divulge personal details, given the new 
provisions of the Data Protection Act protect- 
ing privacy. It is not easy to draw a firm line 
between those unproductive cases who have 
refused and those with whom no contact has 
been made. Mobility is not necessarily a prob- 
lem for follow-up if the cohort member keeps 
in touch. 

Hitherto, the consensus has been that attri- 
tion has biased the 1970 and 1958 cohorts 
somewhat away from less advantaged and less 
able individuals. This means that the datasets 
should be treated with caution as sources of 
cross-sectional estimates of prevalence, but as 
sources of evidence for models of longitudinal 
processes they should still be useful provided 
the analysis controls for variables which are 
correlated with survey loss, and the possibility 
of attrition bias is taken seriously on a topic by 
topic basis. 
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The secret of successful longitudinal data 
collection is the goodwill of the cohort mem- 
bers (or their families). The surveys rely on 
their cooperation not only in giving interviews 
but in keeping in touch and updating their 
addresses. The rules of the NHS Research Ethics 
regime, which must give clearance to the stud- 
ies, forbids giving material incentives to adults, 
although it does permit a small present for the 
cohort child (and a smaller one for their sib- 
lings). The only incentive that can be offered 
the cohort members to continue their coope- 
ration is to let them know that their participa- 
tion is important, and that their information is 
being used to good ends. They are offered as 
much feedback as we can afford, via an annual 
mailing and a dedicated website, but the bud- 
get could only run to offering a discount on the 
purchase of the book about the “Children of the 
New Century”, and an 8-page magazine-style 
summary for all the MCS parents. 


4 Findings and scope for analysis of 
the Millennium Cohort 


There are over 1000 publications using data 
from the cohort studies reported to the CLS, 
listed on the CLS website www.cls.ioe.ac.uk, 
along with a selection of references to key 
publications, to which readers are referred as 
space precludes extensive referencing here. 
Health-related research on NCDS has recently 
been reviewed by Power and Elliott (2005). 
It famously includes establishing the conse- 
quences for the child of smoking in pregnancy, 
linking childhood disease to adult outcomes, 
discovering which conditions persist and 
which are outgrown. Work on “health inequal- 
ities” has shown the relatively small extent to 
which socioeconomic variations in outcomes 
are governed by health selection and the greater 
extent that risk and disadvantage accumulate 
across the life course. 

The first three cohort studies have also docu- 
mented the intergenerational transmission of 


educational advantage, and contributed to a 
debate on social mobility. They have been used 
to estimate returns to investment in higher edu- 
cation and the gender premium in pay. They 
provided key evidence on failure to acquire the 
basic skills of literacy and numeracy by adult- 
hood. Behavioral as well as cognitive indica- 
tors have been shown to play an independent 
role in predicting outcomes, including crime 
and adult mental illness. Family disruption in 
the parental generation predicts faster forma- 
tion and dissolution of partnerships by the next 
generation. In due course, the MCS will be 
able to show whether these patterns persist or 
change for those growing up in the 21st cen- 
tury, although this will take several decades to 
materialize. 

Initial findings from the first survey of MCS 
were outlined in Dex and Joshi (2004), which 
is a set of descriptive tables designed to stim- 
ulate further analysis, including external users 
of the public dataset. A more considered set of 
essays were collected by the same editors, Dex 
and Joshi (2005), from the team of collaborators 
who had helped put the survey together. Other 
publications using the dataset are beginning to 
appear and be listed on the CLS website. 

The first survey of MCS has already set out 
the diverse initial circumstances from which 
the “Children of the New Century” are setting 
out on life. The range of data already collected 
and soon to be collected in childhood, pro- 
vides a richer resource to investigators of the 
circumstances and outcomes of contemporary 
children, than did the 1958 and 1970 studies in 
their early years. The nine months survey per- 
mits analysis of the interrelation of the health of 
the child, health of both parents, social, demo- 
graphic, economic and attitudinal variables, 
neighborhood type, and ethnicity. Information 
on ethnic group is effectively absent from the 
earlier cohorts, but MCS has nearly 3000 fami- 
lies from minority ethnic groups, tightly con- 
centrated in the areas where it was expected to 
find them. 
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Poor families were also concentrated in the 
areas selected to over-represent them, but there 
were relatively more outside such areas than 
ethnic minorities outside the selected wards 
and more non-poor within poor areas than 
white respondents in the minority ethnic areas. 
The minority ethnic families tended themselves 
also to be economically disadvantaged, but to 
display cultural diversity across the different 
groups. Other correlates of economic disadvan- 
tage were young motherhood and lone moth- 
erhood. Of the (weighted) sample, 16% were 
mothers without a (regularly) resident partner. 
The questions revealed a spectrum of degrees 
of contact between the cohort child and their 
father, nonresident, semi-resident and present 
in the household (Kiernan and Smith, 2003). 
Fathers’ involvement in parenting the cohort 
child began for 86% of them with being present 
at the birth. This is a marked contrast to the 
earlier cohorts where such information was not 
even recorded. On other aspects of childbirth 
it is possible to make some more precise inter- 
cohort comparisons: 22% of MCS births were 
caesarean sections, twice the rate in 1970; only 
2% of the births were at home, rather than hos- 
pital, compared with 42% in the 1946 cohort 
and 35% in 1958 (Dex and Joshi, 2005). 

The cohort studies have also been used to 
chart the increase over time of mothers’ paid 
work (Hansen et al., 2006). MCS documents 
this trend having reached the point where half 
of all mothers were employed by the time the 
cohort child was aged 9 months (this took 
about 5 years after 1970 and 7 years after 
1958). As a reflection of this change, and of 
associated policy interest, it includes far more 
information on childcare than its predecessors. 
It also illustrates that there is still a marked 
variation in mothers’ attachment to the labor 
market, greater for the more educated, older 
women in two-parent families, and particu- 
larly low for certain ethnic groups (Pakistanis 
and Bangladeshis among the larger minorities), 


while Black mothers had the highest rates of 
full-time employment. 

Both first and second generation studies of 
the previous cohorts have been influentially 
used to demonstrate adverse effects of family 
poverty on child development. This has helped 
inspire the policies such as Sure Start and the 
Children’s Fund, which MCS is now being used 
to monitor. The abolition of child poverty and 
the support of family life in general is explicit 
government policy. This contrasts with the very 
different policy regimes and economic circum- 
stances obtaining when the previous cohorts 
were children. There is already evidence from 
the past of relationships between child out- 
comes and family structure, mother’s employ- 
ment, migration, and neighborhood type (e.g., 
McCulloch, 2006) which can be explored in 
greater depth with the richer data collected 
in MCS. 


5 Conclusion 


The British national birth cohorts have not only 
established a tradition in the UK but they have 
also had their imitators in several other coun- 
tries around the turn of the Millennium. There 
are national child cohort studies already under- 
way in Canada, USA, Australia, Denmark, and 
Norway, with major studies being planned in 
the USA, France, and Ireland. None are exact 
replicas of the Millennium Cohort Study, but 
there are sufficient similarities to permit cross- 
country as well as cross-cohort comparisons. 
Whether the British tradition will be able to 
sustain existing studies and support any new 
ones in later decades of the 21st century will 
depend on maintaining a symbiosis between 
informants, funders, and users. Potential users 
of all the CLS cohort studies are invited to reg- 
ister on the Centre’s website, www.cls.ioe.ac.uk 
for up-to-date information, listing of in-house 
and user publications, detailed documentation, 
particulars of substudies, and news of data 
deposits and training events. Users of cohort 
study data are requested to observe their 
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obligation to report the publications that the 
studies have yielded for dissemination on this 
website. The future of the studies depends not 
only on maintaining a good response rate from 
cohort members, but also from the scientific 
community. 


Glossary 


BCS70 1970 British Cohort Study 


CHES Child Health and Educational Study 
(name of BCS70 at ages 5 and 10) 


CLS Centre for Longitudinal Studies, the cus- 
todian and curator of the latest 3 national cohort 
studies, an ESRC Resource Centre since 2004, at 
the Institute of Education, University of London 


DSS Department of Social Security, later 
Department for Work and Pensions 


ESF European Social Fund 
ESRC _ Economic and Social Research Council 


Health visitors Community nurses charged 
specifically with domiciliary care of young fam- 
ilies 

ICCS International Centre for Child Studies. 
The charitable organization which sponsored 
BCS70 (and research on NCDS) from 1983, 
Director, Neville Butler 


MCS Millennium Cohort Study 
(funded mainly by ESRC, field name Child of 
the New Century) 


MRC Medical Research Council 
NCB National Children’s Bureau 


NCDS National Child Development Study 
(1958 birth cohort) 


NICHD National Institute of Child Health and 
Development (US) 


NRDC National Research and Development 
Centre for Adult Numeracy and Literacy 


NSHD National Survey of Health and Deve- 
lopment (MRC survey of the 1946 birth cohort) 


ONS Office for National Statistics. Leads con- 
sortium of government departments co-funding 
MCS 


SSRU_ Social Statistics Research Unit, City 
University, London, the previous home of 
NCDS and BCS70 


Youthscan The name given to the BCS70 
survey at 16 
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| Chapter 6 J 


Retrospective longitudinal research: 
the German Life History Study 
Karl Ulrich Mayer 


1 Introduction and overview 


This article provides a comprehensive overview 
of the German Life History Study (GLHS). 
The GLHS is an almost entirely quantitative 
study based on nationally representative sam- 
ples of eight birth cohorts in West Germany 
born between 1919 and 1971 and five birth 
cohorts in East Germany born between 1929 
and 1971. Data was collected in nine differ- 
ent surveys between 1981 and 2005. These sur- 
veys were based on retrospective measurements 
in the sense that respondents were asked to 
recall episodes and events of their lives from 
the day of birth (e.g., place of residence) up 
to the time of the interview. Longitudinal data 
was recorded as event sequences in multiple 
life-domains and dated monthly. In this man- 
ner, time-continuous data covering all of past 
lives is being reconstructed. Altogether, more 
than 13,921 quantitative life histories in the 
form of multiple life domain event histories 
were collected from 11,441 respondents. At the 
time of the interview, the birth cohorts covered 
ranged in ages from 27 to 65. In addition, a com- 
ponent of the GLHS was incorporated in the 
Berlin Aging Study (Mayer and Baltes, 1996) 
where besides medical and cognitive assess- 
ments retrospective life histories were collected 
for 516 respondents living in West Berlin in 


the nineties between the ages of 70 and 103. 
Furthermore, a series of smaller methodological 
studies was conducted to assess and improve 
response rates and retrospective measurement. 

A combination of six characteristics makes 
the German Life History Study distinctive: 


1. It is a study of nationally representative birth 
cohorts. 

2. It obtains longitudinal data by retrospective 
measurement. 

3. Across cohorts it spans an historical period 
of more than 50 years and it is, there- 
fore, a unique instrument for studying social 
change. 

4. It covers several life-domains, especially 
both family and work. 

5. By sampling both West Germany and East 
Germany it has an inbuilt design to study dif- 
ferences between sociopolitical systems and 
one important case of post-Socialist transfor- 
mation. 

6. It has invested more than any comparable 
study in the assessment of the quality of ret- 
rospective measurement as well as in careful 
data editing. 


The German Life History Study started as 
an effort to trace changes in stratification pro- 
cesses and their embedding in contexts of social 
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and economic discontinuity. It developed into 
a comprehensive research program on social 
mobility, life-course dynamics, transitions to 
adulthood, gender and cohort disparities, as 
well as the consequences of welfare state poli- 
cies on patterns of life courses. With the 
opening of the Berlin Wall and the integra- 
tion of the former German Democratic Repub- 
lic into the Federal Republic of Germany, we 
extended the study to East Germany in order 
to reconstruct the former Socialist GDR soci- 
ety as well as the impact of the post-Socialist 
transition on life courses and the unification 
processes between the two Germanies. 

The article is structured into five major sec- 
tions. In Section 2 we will briefly summarize 
the motives and origins of the GLHS, its analyt- 
ical goals, and the institutional contexts within 
which this research program was located. In 
Section 3 we provide basic information on the 
various surveys and their contents, as well as on 
the methodologies employed. Since the GLHS 
has been chosen for this handbook, among else, 
as an exemplar of retrospective longitudinal 
studies, we will focus in Section 4 especially 
on the issues, problems and solutions related 
to retrospective measurement. In Section 5 we 
will address the substantive areas for which 
the GLHS was intended, using and highlighting 
major findings from the study. In Section 6 we 
provide practical information on data access, 
data documentation, and give an introduction 
to the publications which have resulted from 
the GLHS. 

We concentrate here on the more method- 
ological and technical aspects of the German 
Life History Study and on a series of detailed 
study documentations (see 4.1) as well as 
on earlier overviews by Briickner and Mayer 
(1998); Diewald et al. (2006: Appendix 1); 
Hillmert and Mayer (2004): Ch. 11; Solga (1996); 
and Wagner (1996). For a general introduction 
into the research program of life-course the- 
ory and analysis, see Mayer (1990, 2000, 2004; 
Mayer and Huinink, 1990); for an explication 


of its theoretical rationale, see Mayer (2001); 
Mayer and Miiller (1986). 


2 Origins, goals and institutional 
contexts of the German Life 
History Study 


2.1 Origins 


The GLHS was initiated by Karl Ulrich Mayer 
and Walter Miiller in the late seventies at the 
University of Mannheim (Germany). It grew out 
of four related research contexts. First, based 
on our prior work on intergenerational social 
mobility, we were interested in unraveling the 
mechanisms and processes generating socio- 
economic inequalities across the life course 
going beyond the static comparisons in mobi- 
lity transition matrices and the highly reduced 
structural model of status attainment research. 
Second, also in prior research based on cen- 
sus data, we had uncovered large differences 
between cohorts due to the large historical dis- 
continuities resulting from World War II and 
the immediate postwar period. While the war 
and postwar turbulences were succeeded by the 
German “economic miracle” of the sixties until 
the mid-seventies by rapid educational expan- 
sion and occupational upgrading, it was unclear 
whether positive trends would continue or be 
reversed. Therefore, we wanted both to detail 
these cohort differences and extend and update 
them to more recent birth cohorts. Third, the 
GLHS project was part of a research group com- 
prising both economists and sociologists from 
the universities of Mannheim and Frankfurt 
interested in issues of social and economic 
accounting, the impact of social policies, and 
the use of individual and household microdata 
for these purposes. Fourth, our primary prior 
data sources were individual-level longitudi- 
nal data from the Microcensus-Supplementary 
Survey from 1971 born between 1920 and 1940 
(comprising 1% of the population, i.e., about 
half a million cases). This data source dried 
up because rigid data protection laws and rules 


Presented by: https://jafrilibrary.com 


were enacted and the West German Census 
Office lost interest in longitudinal data. There- 
fore, we were pushed to collect our own data 
rather than use secondary material. 


2.2 Goals 


The initial analytical goals for the GLHS 
derived from our sociological interests in social 
stratification, gender inequality, social change, 
and the welfare state. In regard to social strat- 
ification, we wanted to gain a better causal 
understanding of the mechanisms and _ pro- 
cesses generating socioeconomic and gender 
inequalities across the life course going beyond 
crude models of social background and educa- 
tional attainment by taking into account more 
fully family histories, residential histories, edu- 
cational and training trajectories, labor market 
processes, and family formation. In regard to 
social change, we wanted to employ cohort 
comparisons to trace more precisely social con- 
tinuities and discontinuities as well as the 
impact of period and cohort effects. In regard 
to the welfare state, we were interested in 
the social consequences of institutional frame- 
works and social policies for patterns of life 
courses and life chances in the particular con- 
text of the “German social model” and its 
ongoing reforms. Methodologically, we were 
clearly fascinated by the new potential offered 
by micro-level longitudinal data, enhanced 
computing capacity, and recently developed 
dynamic modeling for exploratory, descriptive 
and causal research, as well as by the promise to 
transform biographical case studies to nation- 
ally representative samples. 


2.3 Organization, institutional contexts 
and funding 


In 1979 the GLHS started as a project of the 
Special Research Unit on Microanalytic Foun- 
dations of Social Policy (Sfb 3) of the German 
National Science Foundation at the universities 
of Mannheim and Frankfurt with Karl Ulrich 
Mayer as principal investigator. Support from 
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this grant extended into the early nineties. 
An important boost for the design and early 
data collection of the GLHS came from the 
fact that in 1979 I also became a Program and 
later Executive Director at ZUMA, the German 
National Survey Research Center in Mannheim. 
Significant methodological solutions from the 
professional staff of ZUMA were provided on 
sampling, questionnaire design and fieldwork, 
coding, as well as data organization. It was 
a serendipity and crucial precondition for the 
further development of the study that in 1983 
I became director at the Max Planck Institute for 
Education and Human Development in Berlin 
and Head of its Center for Sociology and the 
Study of the Life Course, an institute devoted 
to basic research. The Max Planck Associa- 
tion for the Advancement of Science provided 
through this Institute until the year 2005 the 
almost exclusive funding for the later cohort 
studies, the personnel resources, and a rich 
intellectual environment. In 2003 I accepted 
an offer from Yale University and the GLHS 
moved from Berlin to New Haven and my Cen- 
ter for Research on Inequalities and the Life 
Course (CIQLE). One of the cohort studies (on 
the 1964 and 1971 birth cohorts for West Ger- 
many) was cofunded by the German Labor Mar- 
ket Research Institute (IAB) and partly financed 
by the European Social Fund. 

The German Life History Study is above 
all a collective effort of highly dedicated 
research and technical staff. In its lifetime of 
more than a quarter of a century, 18 full- 
time researchers have been associated with 
the GLHS, 16 professional and technical staff 
persons, 16 doctoral students, and more than 
ten post-doctoral students. Some of the senior 
research associates served at various times 
as study directors for various cohort sur- 
veys (among them Johannes Huinink, Martin 
Diewald, Britta Matthes, Steffen Hillmert, Heike 
Trappe, G6tz Rohwer, Reinhard Nuthmann, 
Michael Wagner, and Ineke Maas). Erika Briick- 
ner served as the head of survey operations 


88 Handbook of Longitudinal RESAPEPY: https:/jafrilibrary.com 


during crucial years of the study. Apart from the 
panel study on the 1971 cohort, all of the data 
collections were carried out in collaboration 
with commercial survey research firms. In tem- 
poral sequence these were GETAS in Bremen 
(Barbara von Harder), Infratest Social Research 
in Munich (Klaus Kortmann), and INFAS in 
Bonn-Bad Godesberg (Doris Hess). 


3 Surveys and methods: sampling, 
data collection, data editing 


Pilot studies for the GLHS began in 1979. From 
1981 to 2005 we conducted five surveys (cov- 
ering 8 different cohorts viz. cohort groups) 
and one panel study on West German sam- 
ples and two surveys (covering five different 
cohorts viz. cohort groups) and two panels on 
East German samples (Table 6.1). For the pur- 
pose of this article we denote by the term 
“panel study” a design where some years later 
we re-interviewed the respondents of an earlier 
retrospective study and measured the interim 
period by continuous retrospective data. 


3.1 The West German studies 


The first West German component of the 
GLHS—collected in 1981-—1983—constructed 
representative samples of three different groups 
of birth cohorts: 1929-31, 1939-41 and 1949-51 
(Mayer and Britickner, 1989) with an overall 
sample size of 2171. This 1981-83 survey set in 
many respects the exemplar for the following 
surveys of the GLHS. It concentrated on small 
ranges of birth cohorts in order to capture fine- 
grained period and cohort effects. Basically, 
sampling costs prohibited focusing on 1-year 
birth cohorts and, thus, three-year bands served 
as the best compromise. Further, it established 
the basic recipe for data collection by focusing 
on retrospective event histories in separate life- 
domains (residence, family of origin, education, 
training, employment and careers). Events and 
transitions were recorded forward in time and 
dated monthly. 


In the years 1985-87 the birth cohort of 
1919-21 (Briickner, 1993) was added in two 
separate surveys (n=1412), one by means of 
personal interviews and one by telephone inter- 
views. By switching from personal to telephone 
interviews we showed that even very long stan- 
dardized life histories (median 2—2.5 hours up 
to 6 hours) could be collected by using the tele- 
phone. Couples’ decisions on retirement and 
the gendered impacts of old-age insurance poli- 
cies became the focus of the analyses of this 
study (Allmendinger et al., 1993). In 1988-89 
data for the cohorts born 1954-56 and 1959- 
61 (Briickner and Mayer, 1995) were collected 
(n= 2008). In 1997 and 1998 we extended 
the cohort series by collecting by telephone 
interviews almost 3000 life histories from the 
cohorts born between 1964 and 1971. Why 
this extension in regard to birth cohorts? It 
became clear very soon that by focusing on the 
cohorts born between 1930 and 1950 a very 
particular “success story” of continuous col- 
lective advancement would have to be writ- 
ten, while we had good reasons to assume that 
the 1930 cohort was much worse off than the 
cohorts born before and that a similar rever- 
sal was argued for the cohorts born after 1950. 
Also, we found it very attractive to include the 
earlier cohort born around 1921 because we 
could trace it up to retirement age and look 
at the impact of their war experiences. For the 
more recent cohorts, we knew that their demo- 
graphic behavior had radically changed and 
we wanted to pursue explanations for these 
changes (Huinink and Mayer, 1995a). The 1964 
cohort warranted special consideration because 
it was not only the largest in absolute size 
at birth, but grew by about a fifth through 
immigration. Both the 1964 and 1971 cohorts 
were of particular interest due to the economic 
downswings in the eighties and nineties and 
the alleged impacts of international competitive 
pressures. This interplay of cohort size, labor 
market conditions and policy measures became 


Table 6.1 Cohorts and panels in the German life history studies 
THEMATIC | Life courses and The war generation and the | Lost generation? The impact of the baby Early careers Life courses and East German life courses 
FOCUS societal transition to retirement Career entry in the boom: education, and family historical change in the | after unification 
development labor market crisis training, and early work formation German Democratic 
lives Republic 
Population West Germany, including West Berlin Eastand West East Germany, including East Berlin 
Germany 
N 2171 407 1005 2008 2909 1073 2331 610 1407 (Panel) 
4878-99 (0) 1929-31 (60) 1929-31 (65) 
Birth cohorts 1954-56 (33) 1964 (35) 1939-41 (50) 1939-41 (55) 
(age 1939-41 (40) 1919-21 (65) | 1919-21 (67) 1971 (34) 1971 (27) 
1959-61 (28) 1971 (27) 1951-53 (40) 1951-53 (45) 
observed) 1949-51 (30) 
1959-61 (30) 1959-61 (35) 
ae ina 62.3% 48.8% 73.3% 86.1% 66.1% N/A 52.2% 49.5% 74.1% 
Field period 1981-83 1985-86 1987-88 1988-89 1998-99 2005 1991-92 1996-98 1996-97 


Data 
collection 
mode 


Personal interviews 


Computer-assisted telephone 
interviews 


interviews 


Computer-assisted telephone and personal 


Personal interviews 


Computer-assisted 
telephone and personal 
interviews 


Core 
contents 


Detailed event histories: family formation and fertility, education and t 


raining, employment and interruptions, residential mobility. Detailed questions on family of origin, 


including siblings. All studies also include some questions on politics and religion, current economic situation, and on the interview situation. 


Specific 
contents 


Detailed work 
histories for 
spouses; assets 
and savings; 
activities, interests, 
needs 


Impact of wartime events on 


respondents’ life histories; 
political socialization; 
transition to retirement 


Additional questions 
on vocational/ 
professional training, 
labor market entry, 
future aspirations 


Labor contracts and job 
decisions; occupational 
control beliefs; 
memberships (politics, 
associations, religion); 
household structure and 
income, life satisfaction 


Impact of delayed 

career and family 

formation; work— 
family issues 


Economic situation; 
changes and 
experiences before and 
after fall of the wall; 
social networks and 
informal exchange 


Membership in organizations 
and associations; party 
preferences; 
social support networks; 
control beliefs and various 
other psychological scales; 
economic situation 
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the central focus of our monograph on the latter 
two cohorts (Hillmert and Mayer, 2004). 


3.2. The East German studies 


When the Berlin Wall fell, we seized the oppor- 
tunity to extend the study to East Germany 
and its transformation because this extraordi- 
nary opportunity presented us with an exem- 
plary case, a natural experiment, for studying 
life courses under the conditions of extreme 
societal discontinuities. Data on four cohort 
groups were collected in East Germany in 
1991-92: 1929-31, 1939-41, 1952-54, and 
1959-61 (n= 2331) (Goedicke et al., 2004). We 
had to rely on personal interviews since tele- 
phone coverage was still low in the East. The 
selection of the specific cohorts was partially 
due to our attempt to match the West German 
samples and to adjust for specifics of East 
German sociopolitical history. We went back 
to a fraction of our East German respondents 
with a mailed questionnaire in 1993 (enlarging 
on personality and network variables) (n = 610) 
and interviewed them on their intervening life 
trajectories again in 1996-97 (n=1407). Our 
primary goal with this panel was to cover more 
of the transformation process and to explain 
its outcome in a life-course framework. At that 
time we added the 1971 cohort (n=1407) for 
East Germany which was then re-interviewed 
in 2005. We used these studies on the one hand 
for a reconstruction of lives under the Commu- 
nist regime in the GDR (Huinink and Mayer, 
1995b) and for a comprehensive study of life 
courses during the post-Socialist transforma- 
tion (Diewald et al., 2006). 


3.3 The 2005 panel study and qualitative 
biographical complement 


In the year 2005 we re-interviewed 1073 of 1805 
of the men and women born in 1971 from both 
the 1997 East German Study and the 1998 West 
German Study. We had four motives for the 
panel study. First, when we interviewed the 
1971 respondents they were about 27 years old. 


That means few of them were married, almost 
none of them had children, and about a quar- 
ter were still in training. In order to capture 
their full transition to adulthood and to unravel 
the mysteries of their delayed family formation, 
we were highly interested to follow up on their 
life trajectories. Second, we had developed our 
own computer-assisted telephone life-history 
questionnaire, especially suited to optimize 
recall, and wanted to test this instrument under 
normal survey conditions. Third, we had for 
the first time the opportunity to conduct the 
study in our own telephone interviewing lab- 
oratory and, thus, to exert full control over 
the process. Finally, we wanted at last to com- 
bine quantitative and qualitative methods of 
data collection. On the basis of the quantitative 
protocols we selected a small sample stratified 
according to gender, East-West, North-South, 
Urban—Rural and High-Low qualifications. In 
cooperation with the Berlin Institute of Social 
Research we conducted 27 narrative biographi- 
cal interviews which are available both on tape 
or digital record and in transcript. Field time 
ran from early 2005 to the end of June 2005 and 
was truncated due to restricted funds. Selectiv- 
ity should, therefore, be carefully monitored on 
the basis of the initial surveys. 


3.4 Methods studies 


In addition to the nine major surveys, we con- 
ducted a series of supplementary methods stud- 
ies. This started in 1979 with pilot studies 
testing for the accuracy of recall based on a 
local cohort survey carried out 10 years before 
and testing for variants of questionnaire design 
(fully standardized domain-specific event his- 
tories versus partially structured narrative 
interviews versus life-history calendars). In the 
context of the 1997 panel study of East Ger- 
mans, we carried out a nonresponse study 
(Wehner, 2002). Moreover, for each of the stud- 
ies assessments of representativeness were per- 
formed using relevant cross-sectional census 
data (e.g., Blossfeld, 1987b). Several studies on 
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the reliability of retrospective measurements 
were also conducted (Briickner, 1995; Reimer 
and Matthes, 2006). 


3.5 Sampling 


Our decision to use birth cohorts as popu- 
lation units instead of the entire population 
was primarily motivated by prior work about 
cohort differences in Germany based on census 
data. Financial restrictions ruled out obtaining 
a sample covering the full population and, at 
the same time, gaining a sufficient number of 
people in single birth cohorts or narrow birth 
cohort groups. Therefore, specific cohort groups 
were selected. Our prior work with census data 
provided the criteria for these cohort selections. 
For instance, the 1939-41 cohorts were cho- 
sen because they were at the peak of the great- 
est baby boom of the century. The 1929-31 
cohorts were selected because we already knew 
that these were the cohorts most affected by 
World War II and the breakdown after the war. 
The cohorts 1954-56 (in West Germany) and 
1952-54 (in East Germany) were chosen due 
to their pivotal role in demographic changes 
(on the selection of cohorts, see also Mayer and 
Huinink, 1990). The 1964 cohort was selected 
due its exemplary nature as the most recent 
baby boom cohort. Given overall sample sizes 
well below 40,000, there are good reasons to 
rely on cohorts rather than cross-sections in lon- 
gitudinal surveys (Featherman, 1979), but there 
are even more cogent reasons not to rely on one 
single cohort in either retrospective or prospec- 
tive studies. 

Thus, the sample design required select- 
ing people born in specific years. In contrast 
to representative, cross-sectional samples of 
a population, the cohort-centered selection of 
samples (with a sufficient number of people in 
each cohort) requires special procedures. 

This section deals, in particular, with the 
practical consequences of using a_ cohort- 
centered approach in a nationwide face-to-face 
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and telephone survey. Arriving at a representa- 
tive sample made up of different adjacent and 
nonadjacent birth cohorts requires extensive 
preparatory work. Since the sample is limited 
to only these cohorts, it had to be selected out of 
the entire corresponding population. Four dif- 
ferent methods were used to accomplish this. 
For the face-to-face surveys (1981-83 and 1985), 
household listings based on the ADM Master 
Sample were collected to identify the target 
population. On the basis of these 13,974 private 
households, a sample of cohort members was 
chosen. 

The telephone surveys were based on an 
initial large-scale representative sample of 
households with telephones, out of which 
the informants for the second survey of 
elder cohorts and the two samples for the 
younger cohorts were selected. Starting with 
the 1954-56 cohort, we relied on the now 
electronically available residential registers for 
pre-selected counties and cities. For the East 
German samples the totalitarian character of the 
GDR regime proved helpful for the sampling 
design. The GDR had a unitary population regis- 
ter which could directly be used to draw region- 
ally stratified samples of members of certain 
birth cohorts. 

Special strategies were necessary to compen- 
sate for specific, regional losses, which affected 
primarily major cities or urban centers in the 
first field survey and primarily the points 
lying in rural areas in the second. In both of 
these (face-to-face) surveys, it was necessary to 
make repeated follow-up trips in order to inter- 
view informants who were difficult to reach 
geographically or who were indecisive about 
participating in the survey. Besides the geo- 
graphical dispersion of the informants, another 
probable reason for the slow tempo at which 
data was collected was the strict procedure used 
to select the informants (no substitutions could 
be made on either the household or the indi- 
vidual level). Response rates of around 60% 
may raise doubts about whether the results 
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obtained in laborious field surveys were worth 
the effort required to obtain them. However, 
these response rates are close to the ones for 
the usual cross-sectional surveys in Germany 
despite the fact that our demands on the respon- 
dents’ time were much higher. 

Switching to collecting data by telephone 
does not only allow better supervision and 
selection of interviewers, but also made a more 
centralized field operation possible and, like- 
wise, appeared to present a viable solution in 
identifying samples of specific cohorts. The rate 
of those not at home could be reduced from 
nearly 4% to 1.5%. Furthermore, the “direct 
line” to the respondents further made it pos- 
sible to split up the long interviews into sev- 
eral telephone conversations. This proved to 
be a big help, especially for the oldest cohort, 
whose interviews were sometimes extremely 
long (one-third of all interviews in the survey 
of the eldest cohort were conducted in two 
or three calls). In view of the mobility of the 
younger cohorts, the up-to-dateness of the tele- 
phone sample, as well as the ease of repeated 
contacts, proved to be an advantage in the pro- 
cess of data collection. 


3.6 Methods of data collection 


Life-domain specific event histories, PAPI 
and CATI-instruments 

There were only a few examples of quantitative 
life-course studies using large samples when 
the GLHS was initiated. In the mid-sixties a 
representative survey of the male population 
known as the Johns Hopkins Study was carried 
out by Coleman and Rossi in the United States 
in 1968, and a similar survey, the Norwegian 
Life History Study, was carried out by Rogoff 
Ramsoy in 1971. Allmendinger (1989) used 
these Norwegian and US data together with the 
GLHS for a study on educational stratification 
and career processes. Noteworthy studies origi- 
nating after the GLHS and mostly modeled on it 
are the Swedish Life History Study Jonsson and 
Mills, 2002), a Dutch Study (de Graaf, 1987), 


and the Swiss Life History Study by Marlis 
Buchmann (see Buchmann et al., 2006), and 
also the study conducted by Mach (2003) which 
is a direct replication of our 1971 cohort study. 

The German Life History Study differed from 
earlier studies in two important aspects. First, 
it is more representative: including women and 
a broader range of birth cohorts. Second, it 
is more comprehensive with regard to life- 
domains: including an education, training and 
employment history, a full residential history 
not only of locations but also of apartments, a 
family history, and a number of other important 
thematic areas of the life course such as parents 
and children. The goal of the questionnaire con- 
struction was to represent the “natural history” 
of individuals on society to the greatest extent 
to which it could be rendered strictly compara- 
ble between subjects and be made quantifiable. 

We also parted ways with earlier (and later) 
studies by not employing a life-history calen- 
dar. In a life-history calendar yearly or monthly 
time defines the rows of a matrix, whereas 
types of events (e.g., schooling, employment, 
marriage, child birth) define the columns. The 
advantage of a life-history calendar is that all 
events and the time dimension are represented 
on a single sheet of paper (or electronic equiv- 
alent), thus facilitating comparability and con- 
sistency across domains. The big disadvantage 
of life-history calendars lies in the fact that only 
very little information can be inserted on spe- 
cific episodes. This format also implies a rec- 
tangular data array where all months or years 
must contain some information. The advantages 
and disadvantages of multiple-domain event 
sequences are complementary. This format is 
efficient in data storage, since it focuses on a 
sequence of spells or episodes and only needs 
to list beginning and ending dates. Thus, it 
corresponds both to the way the information 
is actually solicited in an interview and the 
way it is used in survival and event history 
analysis. Most importantly, it allows us to col- 
lect a relatively large amount of information 
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for each given episode. For instance, for any 
given job we collected data on employment 
status and occupational title, industrial sector 
and size of firm, hours worked, type of con- 
tract, beginning and ending income, as well as 
reasons for job changes. Concentrating on spe- 
cific life-domains actually facilitates recall, but 
requires more investment in checking on incon- 
sistencies across life-domains. In the earlier 
(paper-and-pencil) interviews this was either 
accomplished by fold-out pages or by data edit- 
ing after the interview. In the CATI-versions 
these checks were done automatically leading 
to additional questions on potential inconsis- 
tencies. 

Thus, in developing the questionnaire we 
first reviewed life-domains: family of origin 
and one’s own family history; education and 
training; residence and household; employ- 
ment, income, and consumption; social, reli- 
gious, and political participation; friends and 
informal networks (in the first survey also dis- 
abilities and the medical history). For each of 
these life-domains, we first tried to convert it 
into continuous event and state histories. We 
started with the idea that for each life-domain 
there is a more formal, institutionalized, and 
legitimate path, as well as a more informal, 
marginal, and less institutionalized path. Thus, 
besides natural parents, we asked for foster and 
step parents. Besides (full-time) formal school- 
ing and training, we also asked for part-time 
and for further education and training. Besides 
main jobs, we also asked for additional jobs and 
marginal employment. And in regard to family 
formation, survey by survey we extended the 
marriage history to a history of cohabitation and 
of partners. Limitations of the already exces- 
sive interviewing time as well as measurement 
problems led us to cut down the life-domains 
which were finally included. For instance, we 
did not use an instrument for diachronic associ- 
ational membership or for friends across the life 
course. Also, we economized on the household 
composition history by attaching—in some of 
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the surveys—a shortened version of the residen- 
tial history (only locations). We also dropped 
a good and tested instrument for consumption 
history, namely a history of bought cars and 
their properties. 


Questionnaire design 
The following factors had to be taken into 
account while designing the questionnaire: 


1. The degree of complexity of the question- 
naire, which had to record a great deal 
of information and also do justice to the 
interindividual, group-specific, and cohort- 
specific variance of life courses. 

2. The historical conditions under which struc- 
turally equivalent life events took place in 
different historical periods. 

3. The recording of time (frequencies, dura- 
tions, and absolute and relative points in 
time) both as a simple measurement (dimen- 
sion) and as a way of structuring the life 
course. 

4. The “sensitive” topics in the respondent’s 
life and the socio-demographic data and 
life events of the respondent’s relatives 
(parents, siblings, spouse, children, and 
grandchildren). 

5. The adaptation of the questionnaire to the 
size and field conditions of a representative 
national survey, i.e., the quality of the inter- 
viewers. 

6. The specifically retrospective character of 
the data collected (Briickner and Mayer, 
1998: 161). 


Cohort-specific questionnaires 

In one sense, cohort-specific questionnaires are 
merely a specialized case of screening, e.g., 
if the respondent belongs to this cohort, then 
ask the following set of questions. However, 
they go beyond mere screening in that the 
actual content of the questionnaire is changed. 
German institutions and society as a whole have 
undergone a great many changes in the past 
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hundred years, and people growing up in differ- 
ent historical periods have encountered widely 
varying historical conditions. To name just 
one obvious example, the older cohorts expe- 
rienced the war and its immediate aftermath, 
while the younger ones did not. Similarly, the 
National Socialist educational system in which 
the oldest cohort received its schooling dif- 
fers considerably from the West German educa- 
tional system after World War II, which itself 
has undergone several fundamental changes 
over the last forty years. 

These differences had to be taken into 
account in the questionnaire. The phrasing of 
certain questions as well as the meaning or the 
values of certain variables thus depended on 
the cohort which they describe. Although this 
approach usually limits the extent to which 
different cohorts can be compared, this was 
deliberately ignored in favor of a more accurate 
reflection of historical reality. 


Use of computer-assisted telephone 
interviewing 

The first two surveys were conducted using the 
classical personal or face-to-face interviewing 
method. This method led to several difficul- 
ties, some of which will be discussed below. 
Due to these difficulties, the decision was made 
to conduct further surveys via telephone. Once 
this decision was made, it was but a small 
step to replace the previously used paper ques- 
tionnaire with a computerized questionnaire; in 
other words, to make use of computer-assisted 
telephone interviewing or CATI. This proved to 
be an extremely beneficial decision. One of the 
most important advantages of using CATI was 
the ability to automate the screening process. 
The interviewer no longer had to leaf through 
a long questionnaire filled with little arrows 
and boxes with screening instructions. Instead, 
the computer would automatically screen ques- 
tions and display the next question to be asked 
on a CRT screen. This freed the interviewer 
to concentrate exclusively on conducting the 


interview and recording the data as precisely 
and accurately as possible. 

Moreover, an unlimited number of data valid- 
ity and consistency checks could be automated. 
If erroneous or inconsistent data were entered, 
the interviewer would immediately receive a 
message asking for a correction or confirmation 
of the data. Data validity was checked merely 
by comparing the data entered with previously 
established valid values or ranges. 

Data consistency was checked by comparing 
the data entered both with previously entered 
data and with previously established plausi- 
ble ranges of values. For instance, it might 
be assumed that women have their first child 
between the ages of 18 and 40. By comparing 
a woman’s year of birth with that of her first 
child, her age at the child’s birth could be deter- 
mined. If it fell outside the previously estab- 
lished range of plausible ages, a message asking 
for confirmation would be displayed, e.g., “Was 
your mother really 14 years old when she had 
her first child?” Thus, a major source of errors 
could be eliminated, i.e., corrections could be 
shifted from the later process of data editing 
and high cost re-checking with the respondent 
to the initial interview. 

Additionally, questions could incorporate 
information gained from previously asked ques- 
tions and could, thus, be made very precise and 
specific; such as, “Until when did you work as 
a file clerk in the bookkeeping department of 
company X?” Besides the fact that such ques- 
tions vastly reduce the odds that the interviewer 
or respondent will confuse one event with 
another, such questions also give the respon- 
dents the impression that the interviewer is an 
intelligent person paying attention to what he 
or she is being told. 

One of the few disadvantages of using CATI 
was the loss of transparency offered by a large 
matrix on a single sheet of paper, e.g., per a 
given life-domain. Due to the physical con- 
straints of a computer system, namely only a 
given amount of information can be displayed 
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on a screen at one given time, it was necessary 
to split the matrices up into smaller templates, 
i.e., sets of questions which could fit onto one 
screen. However, the loss of transparency was 
more than compensated for by the automated 
consistency checks described above. Further- 
more, it was possible to maintain an overview of 
the data by switching between displays of infor- 
mation at different levels; for instance, one dis- 
play showing a list of all the jobs an individual 
had held and another display showing detailed 
information for only one of those jobs. How- 
ever, we did observe that telephone interview- 
ers sometimes took notes on a piece of paper to 
keep track of interdomain consistency. Techni- 
cally, these problems could easily be remedied 
by larger and split screen monitors. 


Fieldwork 

Problems with conducting the interviews were 
expected from the outset for several reasons. 
To begin with, the format of the questionnaires 
appeared unusual and unfamiliar to the inter- 
viewers. It was unknown to what extent the 
respondents would cooperate, given that they 
would be asked so many questions of a personal 
nature. Moreover, the survey firms entrusted 
with the fieldwork had to deal with tasks 
much more demanding than those they rou- 
tinely encountered. In general, the difficulties 
encountered during fieldwork are attributed to 
a lack of cooperation on the part of the infor- 
mants. However, this did not prove to be true 
for the study described here. On the contrary, 
with few exceptions, the respondents exhibited 
an almost astonishingly positive reaction to the 
interview. It was, in fact, the interviewers who 
presented the most difficulties throughout the 
entire data collection process (even after the 
introduction of CATI), although a great deal of 
effort was made to supervise and support them. 


3.7 Data editing 


Editing plays an extremely important role in 
the processing of life-course data. The first and 
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probably major task is to record the data pro- 
vided by the respondent in formats which are 
consistent. For instance, if respondents listed 
marginal jobs when asked about their main 
job history they had to be transferred into the 
data array for second and marginal jobs. Sec- 
ond, the sequence of chronological events and 
numerous events linked to one another in terms 
of content had to be combined into a single, 
individual life history free of inconsistencies 
within and between life-domains. Complete- 
ness and plausibility are important criteria, 
both on the individual and the interindividual 
level (for a detailed description of the process, 
see Briickner, 1993; Hillmert, 2002). Consis- 
tency and plausibility as well as the continuity 
of sequential events can be checked in detail 
by using the intrinsic logic of the events and 
their relation to one another as well as to insti- 
tutional and historical contexts. In this respect, 
editing serves as a sort of “internal” validation. 
Like questionnaires, editing methods have to 
be adapted to the specific historical context of 
each cohort. 

The treatment of missing values was a par- 
ticular problem. Completeness is an impor- 
tant prerequisite for the analysis of sets of 
event data. Gaps in the data had to somehow 
be filled, even if follow-up research (which 
included going back to documents or respon- 
dents) could not produce exact information to 
fill them. Dates were reconstructed by using 
so-called artificial months; for instance, 21 
for January = beginning of year, 26 or 27 for 
June and July =middle of the year, and so on. 
Landmarks, which are usually major histori- 
cal events used to set a relative date for per- 
sonal events, also played a role in the editing. 
These relative dates (“... it took place in the 
same month as the assassination attempt on 
Hitler...”) occasionally misled the interview- 
ers into entering false dates, despite the fact that 
they were provided with a chronological list of 
historical events. Using history books and sim- 
ilar reference material, the editors were later 
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able to reconstruct the actual dates on the basis 
of the respondents’ relative dates, which were 
usually authentic and very accurate (they are 
referred to as “flashbulb” phenomena in psy- 
chology). Residential and employment history 
were compared to check for data validity and 
consistency and also to enhance or clean the 
data in difficult cases. 

We relied on several sources for data editing: 


the paper-and-pencil or electronic records 

= biographic abstracts produced from the data 

m™ the tapes which we recorded from all 
the respondents who volunteered (most of 
them did) 

™ written inquiries going back to respondents 

telephone inquiries going back to respon- 

dents. 


| 


Across the surveys we contacted about a fifth 
of the respondents again personally in order to 
resolve real or potential errors. 

An ever-increasing number of data valid- 
ity and consistency checks were computerized 
during the course of the study, but comput- 
ers still could neither completely eliminate nor 
even significantly reduce the task of editing the 
data by hand. All in all, the amount of time 
and money needed for the data to be edited was 
about the same or even more than that needed 
for the interviews themselves. 


4 Autobiographical memory 
and retrospective measurement 


Although data collection in a retrospective life- 
course study has a cross-sectional character (a 
population sample is interviewed only once at a 
given time), the data itself corresponds, in prin- 
ciple, in content to that of a sufficiently long 
prospective panel study, including information 
on the entirety of the respondents’ lives. The 
caveat “in principle” relates to prior selectiv- 
ity (emigration or true mortality) in retrospec- 
tive studies and panel attrition in prospective 
studies. 


The use of retrospective data collected in 
a “one-shot” survey offers a plausible alter- 
native to prospective longitudinal data collec- 
tion, especially in the case of life-course studies 
(see Featherman, 1979; Solga, 2001). It is time- 
efficient and cost-efficient. It is time-efficient 
because one does not have to wait for a long 
time until prospective data spans a time length 
of sufficient duration to be of scientific inter- 
est. It is cost-efficient because longitudinal data 
can be obtained by one or a few interviewing 
phases and survey organization does not have 
to be maintained over a long time. Retrospec- 
tive data collections are often also preferred to 
prospective panels because both the quality of 
survey instruments and the scientific interest in 
the subject matters tend to become obsolete in 
panels which run for a longer time. 

Objections raised against retrospective data, 
in particular, regarding the accuracy and preci- 
sion of people’s memory, apply to all kinds of 
biographical data. Even cross-sectional studies 
often contain questions of retrospective charac- 
ter, usually dealing either with the frequency 
or duration of certain events such as schooling 
or medical treatment or with the time at which 
a given event took place, e.g., age at marriage 
or divorce, birth of children, or employment 
dates. Most frequently such retrospective ques- 
tions in cross-sectional studies are of a much 
more ad hoc nature than if they are asked in 
the context of event sequences. They are, there- 
fore, more prone to recall biases. Also, most 
prospective panel studies, in fact, collect some 
retrospective information to cover events and 
changes between the points of data collection 
and to compensate the arbitrary starting points 
of a panel in the lifetime of its sample members. 
It has been estimated that in any given wave 
only a third of the questionnaire content actu- 
ally refers to concurrent states. It is also worth 
mentioning here that major prospective panel 
studies, like the US Panel Study of Income 
Dynamics (PSID) or the British Birth Cohort 
Studies, do not actually allow construction of 
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continuous event histories. If they allow con- 
tinuous data, such as job sequences, e.g., by 
reconstructing what happened across the past 
year by a monthly calendar, they often suf- 
fer from inconsistencies between recalled and 
concurrent data. Most significantly, prospec- 
tive panel studies rarely provide representative 
samples of life histories due to attrition or per- 
forated coverage. Solga (2001) has estimated for 
the German Socio-Economic Household Panel 
that between 1984 and the end of 1998 only 
25% of the life courses of the respondents in 
the initial sample were complete. The advan- 
tages and disadvantages of retrospective versus 
prospective panel data can, therefore, not sim- 
ply adjudicate in favor of prospective studies. 
It might appear that the problems of both retro- 
spective and prospective survey data could be 
solved by the increasing (and important) use of 
longitudinal register data. Apart from still ram- 
pant problems of access, register data are far 
from limitations. Often administrative purposes 
in variable measurement do not match scien- 
tific objections, variable definitions change over 
time and certain series only start at recent his- 
torical times. Thus, especially if one wants to 
go back in time there is often no alternative to 
retrospective designs. 

That said, it is no doubt that potential prob- 
lems of retrospective measurement constitute 
the biggest challenge to the use of such stud- 
ies in collecting individual-level longitudinal 
data. Therefore, we devote the following sec- 
tions to an assessment of how much error one 
has to expect in using retrospective data and 
whether survey instruments can be fine-tuned 
to reduce such error. Obviously, to hide one’s 
head in the sand, like the proverbial ostrich, 
will not help. Rather one has to tackle the issue 
head on from the very beginning of design- 
ing survey procedures and especially question- 
naire and fieldwork. My general thesis is that 
the quality of instrument design, fieldwork, and 
data editing is much more salient for the reli- 
ability of the data than the question whether 
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it is retrospective or concurrent. Recall errors 
are only a small fraction of the total survey 
error (Groves, 1989). At the outset, let me make 
two observations. First, it is obvious that for 
certain kinds of variables concurrent data is 
almost always better than retrospective data. 
This is, for instance, the case if one is interested 
in household data (and wants to interview all 
household members) or if one wants to collect 
detailed data on income or on attitudes. I stress 
“almost always” because we found that the self- 
employed report their incomes retrospectively 
often much more truthfully than concurrently 
(for fear of the Internal Revenue), and traumatic 
events like marital separation are often revealed 
better with some time distance to the event. In 
a ten-year follow-up we found even the sub- 
jective reasons for not being able to attain the 
desired occupation to be very highly stable. Sec- 
ond, we are today in a much better situation of 
dealing systematically with retrospective error 
than at the time when we started the GHLS. At 
that time very few reliability studies were avail- 
able and were often of an ad hoc nature. The 
psychology of memory with its sole distinction 
between episodic and semantic memory was 
light years away from offering either adequate 
analytical or experimental results to be of any 
use. This distinction suggested that recall in 
autobiographical contexts might be even more 
error prone than recall in general. In the mean- 
time, the psychology of autobiographical mem- 
ory has rapidly advanced (Rubin 1986, 1996) 
and the emergence of cognitive survey psy- 
chology has made major progress (Schwarz and 
Sudman, 1996; Sudman and Bradburn, 1996). 
Within our research group Maike Reimer 
has systematically developed the extent and 
kind of relevance of autobiographical memory 
retrospective measurement and has conducted 
several empirical studies to assess the range 
and types of retrospective error (Reimer, 2001, 
2005a, 2005b; Reimer and Matthes, 2006). The 
psychology of memory has demonstrated that 
persons tend to reconstruct their own biography 
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by selective forgetting and by a change in 
attached meanings in order to enhance subjec- 
tive self-worth and identity. The question is 
whether similar effects occur in the measure- 
ment of most objective life-course data, like 
schooling, occupational trajectories, and fam- 
ily events. Are there types of data which are 
more or less reliable? Are recall errors a func- 
tion of time distance between the event and the 
interview (which would suggest that retrospec- 
tive measurement for small intervals like panel 
interviewing times are okay but long recall peri- 
ods might not be). Are the errors which can be 
revealed consequential for the types of substan- 
tive conclusions we are interested in? Can sur- 
vey instruments be fine-tuned to improve recall 
and to minimize recall errors? 

The insights from the psychology of autobio- 
graphical memory cannot, however, be applied 
easily to standardized retrospective surveys. 
Quantitative social science measurement is 
above all selective in the sense that it is highly 
limited to types of data which are valid, reli- 
able, standardized, complete, and comparable 
between subjects. In contrast, autobiograph- 
ical memory functions, per se, as a selec- 
tive, constructing and dynamic process which 
forms a broad pool of episodes and events. It 
selects, presents and encodes subjectively rele- 
vant ones which are then retrieved again selec- 
tively and in a biased manner. In this sense, 
the usual results of autobiographical memory 
research apply much more to qualitative bio- 
graphical research than to standardized retro- 
spective measurement. 

Nonetheless, in both cases encoding in a tem- 
porary (working memory) and a long time stor- 
age “server”, as well as the retrieval processes 
from these memories, are of crucial nature. 
When persons encode or retrieve episodes and 
events, these must be mobilized as representa- 
tions. Autobiographical memory relates to past 
personal experiences and biographical facts 
without experiential depth. Instances of one’s 
past are remembered better if they fit into 


more general schemata (Brewer, 1986). This is 
highly significant for sociological life-course 
research since it predominantly is interested in 
highly institutionalized and public trajectories 
rather than highly idiosyncratic private lives. 
Such schemata are also life phases and their 
sequences (Conway and Pleydell-Pearce, 2000). 
However, it is not entirely clear how the results 
of memory psychology on life stages relate to 
the spell sequence character of event histo- 
ries. They seem to suggest that the normal and 
typical will be remembered more easily than 
the nonconventional and atypical. Although 
this seems plausible, it is not a logical conse- 
quence, since deviations from the normal could 
be encoded and retrieved on the basis of the 
more generic schema. 

At any rate, retrospective measurement in 
life-course research is supported by the mech- 
anisms of autobiographical memory. For both 
life-course structures, such as sequences and 
temporal relations, hierarchical orderings (e.g., 
main job and second jobs) as well as hori- 
zontal relationships, such as between employ- 
ment and the family situation, are important 
as mechanisms of representation. Thus, it fol- 
lows that there is an isomorphism between 
the manner of how life courses are structured 
in society and how autobiographical mem- 
ory functions optimally. This is crucial in 
our context, since processes of remembering 
and retrospective measurement are not exclu- 
sively connected as sources of error (selec- 
tivity, biases, simplification), but much more 
importantly built upon the very same cognitive 
“srammar.” Personal memories are organized 
through self-schemata, i.e., generalized expec- 
tations a person has about him- or herself. These 
self-schemata are closely connected with nar- 
rative representations of individual life histo- 
ries. Therefore, remembrances which do not fit 
into such schemata are subject to simplifica- 
tion, and changes toward the conventional and 
smoothing. 
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For retrospective measurement it is of spe- 
cial interest how according to autobiographical 
memory temporal information is being stored 
and retrieved. Chronological patterns appear to 
be more salient for retrieval than for encoding. 
Moreover, time distance to the event is related 
to recall reliability in a curvilinear fashion. 
Recall is best for very short time distances, then 
worsens, but stays very stable as time distance 
goes on. This result is crucial, since it could 
support retrospective studies with a long time 
frame into the past. In contrast, errors in dat- 
ing events seem to increase with time distance 
in a linear fashion. Experimental data shows 
that early childhood events are badly recalled, 
events for the age between 15 and 25 are well 
remembered, and little is recalled for middle 
age. This, however, suggests that recall is not 
only a function of time, but also a function of 
the probability of occurrence of given events. 

What then can we conclude from the psy- 
chology of autobiographical memory for retro- 
spective measurement in life-course research?: 
“Through selective forgetting and reinterpreta- 
tion persons tend to construct consistency.... 
Personal memories are simplified and more 
coherent versions of the actual life his- 
tory... and conform more to conventions and 
social expectations... negative experiences are 
more often forgotten or reinterpreted as positive 
ones.... Personal data can be recalled reliably 
and robustly even after a long time, if the repre- 
sentations match autobiographical memory, if 
they fit into self-schemata, if there is a rich rela- 
tional structure, and if recall has been frequent 
and if biographical details can function as sub- 
stantive or temporal cues” (Reimer, 2005a: 51 
and 52; translation by author). 

In regard to the total survey error (Groves, 
1989), recall errors are a potential part of 
observational errors, such as social desirability. 
Recall errors can be of very different kinds. 
They might relate to incidence and frequency of 
episodes or activities (too few, too many), tim- 
ing (ending, beginning), direction (too short, too 


Retrospective longitudinal research 99 


long) and extent (e.g., monthly vs. yearly mis- 
dating), inconsistencies between life-domains 
(e.g., marriage without cohabitation), and differ- 
ences between subgroups of respondents (e.g., 
all family events are better remembered by 
women) (Reimer, 2005a, Ch. 3). Recall errors, 
however, might also—if handled badly—lead 
to serious nonobservational errors as a conse- 
quence of sloppy sampling, data editing, and 
data organization. Thus, in the pragmatics of 
survey research recall errors must always be 
seen in the context of total survey error. 

Prior research demonstrated recall errors of 
different kinds and magnitudes, but results are 
frequently inconsistent and almost exclusively 
only related to short time distances. It also 
remains unclear how much of observed dis- 
crepancies were due to flaws in field research 
or more fundamental problems resulting from 
faulty memory (Reimer, 2005a, Ch. 3). There 
are basically two research strategies available to 
uncover the salience of recall errors. The first 
strategy is short-term or long-term replications, 
i.e., the same respondents are being asked about 
the same time of their life repeatedly. The sec- 
ond strategy relates to systematic comparisons 
with evidence which is not channeled through 
the respondent, e.g., personnel documents or 
administrative data. In the GLHS we applied 
both strategies to assessment of the potential 
problems of retrospective measurement. On the 
one hand, in the East German Panel Study we 
asked respondents again in 1997 about their 
lives after 1989, a period which we already 
had covered partially in the 1991 and 1992 
study. This overlap period thus served as a basis 
for the measuring of recall consistencies. On 
the other hand, we asked respondents to allow 
us access to their social security files which 
record all labor earnings and the full employ- 
ment history (except for self-employed and civil 
servants). 

Reimer shows that, among else, “salience” 
seems to play a large role in recall processes. 
While after five years only 89% name the same 
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number of children, the error drops to 3% for 
legitimate children. A conceptual lack of cla- 
rity (“only my children or also the ones of the 
partner”) and emotional closeness or distance 
(dead or adopted children) influences recall 
error. The error rate or, technically more ade- 
quate, the inconsistency rate is twice as high 
for men than for women. Incidence seems unaf- 
fected by time distance between event to be 
recalled and time of the interview, but the mag- 
nitude of deviations increases. In regard to the 
timing of leaving the parental home, in the sec- 
ond interview very early or very late depar- 
tures were brought closer to the median. In 
regard to job histories, in the later interview the 
number of jobs was reduced (by about 10%) 
and some formerly split episodes were fused. 
Although especially episodes of short duration 
and of low frequency are often better recalled 
when the interviews are not very distant in 
time, for East Germans it was recalled more 
frequently in the five-year follow-up. This is 
clearly the case because unemployment was a 
totally new phenomenon for East Germans, it 
was conceptually unclear, but also individu- 
ally shameful. Five years later, both of these 
conditions did not hold anymore, unemploy- 
ment had become a very public and a collective 
phenomenon, and was, therefore, mentioned 
more often. As to timing (in months) very short 
deviations dominate the results and, thus, are 
not very consequential for substantive analy- 
ses. An interesting result emerged also in rela- 
tion to the fall of the Wall. As an historical 
anchor it should have been especially useful for 
recall (Loftus and Marburger, 1983). In the Ger- 
man case, it, however, also led to some recall 
confusion, since German unification actually 
spanned between the fall of the Wall in Novem- 
ber 1989 and formal state unification in Octo- 
ber 1990. Since the fall of the Wall seems to 
be the stronger anchor, some respondents dated 
episodes of the fall of 1990 erroneously back 
to 1989. Altogether, the less conventional and 
more complex a life, the more likely there will 


be recall error. This is an important finding 
because it introduces a systematic bias into 
retrospective measurement, which one has to 
take into account in interpreting such data. The 
complement in prospective studies is that such 
respondents are more likely to drop out or can- 
not be reached. The more institutionalized, the 
better the recall. For instance, do respondents 
recall the duration of employment episodes bet- 
ter than of nonemployment episodes? A further 
important finding relates to the interrelatedness 
of errors and of reliability. Reimer could show 
that errors in timing are rare, once the type and 
sequence of episodes is properly recalled. This 
seems to corroborate a tenet of the psychology 
of autobiographical memory, i.e., that chrono- 
logical calendars as such do not operate as the 
basis of recall and retrieval, but rather the sub- 
stantive structure and sequences of episodes. 
This clearly gives some boost to the life-domain 
specific collection of life-course data in contrast 
to life-history calendars. The comparison with 
register data corroborates the recall risk of non- 
standard episodes, like unemployment, which 
are markedly less well remembered (or proba- 
bly revealed) than in register data. In our case it 
also demonstrated that in an employment regis- 
ter, family events were much more faulty than 
in the retrospective survey. 

In the GLHS, various features of the data 
collection procedure provide beneficial sup- 
port for the recall task: single events are 
recalled within thematic domains in forward 
chronological order, extensive opportunities 
for cross-references of memories are provided, 
complete sequences are collected which avoids 
boundary effects, calendar dates are recon- 
structed from the life-history context and partly 
anchored with a landmark event. Moreover, 
in every part of the study, methodological 
innovations were being developed, introduced 
and empirically tested, and opportunities to 
check data reliability and validity were used 
or intentionally created. This concerns various 
aspects of data collection (representativeness, 
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nonresponse, interviewer effects, retrospective 
measurement, recall errors), data edition, and 
data linkage. 

Nevertheless, an examination of reliability 
and external validity shows that in retrospect, 
respondents omit or insert, merge and dis- 
sect, temporally stretch or compress episodes, 
label episodes or transitions differently, and 
move transitions forward and backward in time. 
This leads to an overall reduction in com- 
plexity and change in the recalled life course. 
Observed discrepancies seem to be influenced 
by the individual and institutional life course 
and recall contexts in which an event occurs 
and is remembered, maybe more than by the 
time elapsed since an event. Such error can 
be greatly reduced by a meticulous and labor- 
extensive single case data edition. This, how- 
ever, depends on which aspect of error we look 
at (Reimer, 2005b). 

In the GLHS we have also introduced and 
evaluated new features into the data collec- 
tion procedure that emphasize and strengthen 
the role of the interviewer as interface between 
social scientific concepts, data and standard- 
ization requirements, and memory processes. 
A special data collection instrument allows 
interviewers to act as “reconstruction guides” 
and effectively help respondents reconstruct 
episodes as intended and date them correctly 
and consistently. Preliminary results indicate 
that this reduces editing costs and improves 
data quality (Reimer and Matthes, 2006; for 
similar instrument development, see Belli, 
1998; Belli et al., 1999; Belli et al., 2001). 

In sum, one clearly cannot assume naively 
that retrospective data is free of recall error (see 
also Grotpeter, Chapter 7 in this volume), but, 
among other considerations, the drop of such 
errors after careful data editing shows that the 
quality of the survey process is the predominant 
factor in regard to data quality. The question is, 
therefore, not whether one type of data is error 
free and the other is not, but which different 
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errors must be reckoned with in different kinds 
of longitudinal data. 


5 Substantive areas 
and major findings 


5.1 Core empirical areas 


The core substantive areas which overlapped in 
all of the surveys of the GLHS were: residen- 
tial history of locations, parental background, 
siblings (including their schooling), school tra- 
jectory, vocational and professional training, 
further education, employment and occupa- 
tional history (including sector, firm size, labor 
income), marital history, fertility history and 
education of children, education and careers 
of spouses and partners, denominational his- 
tory, current household income. Areas covered 
only in one or less than all of the surveys 
are residential history on the level of apart- 
ments, second jobs and marginal employment, 
nature of employment interruptions, history of 
cohabitation and partnerships, illnesses, con- 
trol beliefs, subjective occupational mobility, 
history of marital satisfaction, life satisfaction, 
sociopolitical attitudes, types and intensity of 
social relationships. 


5.2 Major findings and publications 


Stratification and intergenerational 

social mobility 

In accordance with the initial goals of the 
study investigations on class structure, strati- 
fication, social mobility and status attainment, 
their mechanisms and changes over time in the 
German context and in cross-national perspec- 
tive were a major focus of the analyses based 
on GLHS data (Mayer and Aisenbrey, forthcom- 
ing; Mayer and Carroll, 1987; Mayer and Solga, 
1994). In another aspect of this work the effects 
of educational expansion on inequality of edu- 
cational and occupational opportunity were 
assessed (Henz and Maas, 1995). These studies 
corroborate on the one hand the pervasive struc- 
turing of socioeconomic inequalities in (West) 
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Germany, not least via a selective educational 
system, the persistence of the apprenticeship 
and academic exclusiveness, and on the other 
hand a weak trend of increasing social oppor- 
tunities (which might have come to halt most 
recently). 


The dynamics of careers 

Intragenerational mobility was a special focus 
of the GLHS in regard to job shifts, firm shifts, 
and occupational shifts. Beyond confirming the 
general finding that such shifts decline with 
duration in the labor force, these studies pro- 
vide ample evidence for the higher stability of 
German workers, their higher firm attachments, 
and the pervasive occupational structuring of 
careers. Increases in intragenerational shifts 
were more pronounced in the fifties and sixties 
than in the eighties and nineties (Allmendinger, 
1989; Carroll and Mayer, 1986). Marked cohort 
effects, e.g., for the 1964 baby boom cohort, 
could also be demonstrated which, however, 
were partially offset by political measures 
(Hillmert and Mayer, 2004). 


Education, training and the early career 
Despite massive educational upgrading, the 
transition to the labor market proves to be 
remarkably robust during the 50 years which 
are spanned by the GLHS cohorts, and few 
effects of educational inflation and mismatches 
could be detected (Blossfeld, 1987a; Pollmann- 
Schult and Mayer, 2004). Participation in 
further education gradually increased across 
cohorts, but concentrated in a few years early 
in the careers. It also exacerbated rather than 
diminished educational inequalities across the 
life course (Becker and Sch6mann, 1996). How- 
ever, the transition phase between the end of 
schooling and labor market integration became 
more checkered with more interruptions and 
multiple-training episodes (Jacob, 2004). Fixed- 
term contracts became more widespread at 
career entry, but are mostly beneficial for fur- 
ther advances. Huge problems have developed 
for unskilled youth (Solga, 2005). 


Residential migration, leaving home, family 
formation and dissolution 

The cohort studies of the GLHS cover marked 
discontinuities in family formation. War- 
related delays in marriage and child births 
were followed by a “golden age” of early mar- 
riage and many children, while starting with 
the 1950 cohort a massive decline in nuptial- 
ity and fertility can be observed. For an inter- 
vening period this process was marked by a 
polarization of family behavior according to the 
educational level of women (Huinink, 1995; 
Huinink and Mayer, 1995b). Leaving home has 
become even earlier for women, but is rela- 
tively high for men. Most importantly, as a 
transition it is more split from other family 
events, like marriage. The interrelatedness of 
residential migration with other events in the 
domains of family or work and the importance 
of getting one’s own home was another topic 
for the GLHS (Kurz and Blossfeld, 2004; Wag- 
ner, 1989). While, as a part of the transitions 
to adulthood, the family sphere is marked by 
delays and a certain degree of disintegration 
(e.g. between cohabitation, marriage and child- 
bearing), changes in the sphere of work have 
shown remarkable stability over historical time 
(Briickner and Mayer, 2005). 


Life courses in the post-Socialist 
transformation 

Using the retrospective data from the East 
German Life History Study, we discovered, 
a worsening of career opportunities and an 
increasing rigidity of the class structure in 
the former German Democratic Republic. This 
might have undermined the legitimacy of the 
Communist regime and have hastened its down- 
fall (Huinink and Mayer, 1995a; Solga, 1995; 
Trappe, 1995). In regard to impacts of the 
East German transformation from Socialism 
to Social Capitalism and of the unification 
with West Germany, we found enormous tur- 
bulence in work lives (including high expo- 
sures to unemployment), but high stability in 
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occupational activity, employment status, as 
well as family-related social relations. Another 
remarkable finding was the strong effect transi- 
tion experiences had in changing control beliefs 
which are usually assumed to be highly stable 
and especially salient in times of external crisis 
(Diewald et al., 2006). 


Cross-national comparative studies 

The data of the GLHS has been extensively used 
in cross-national comparative studies, among 
else in Allmendinger’s pathbreaking study on 
career dynamics in Germany, the US and 
Norway, in Hillmert’s study on education, train- 
ing and labor market entry in the UK and West 
Germany (Hillmert, 2001), and the series of 
studies resulting from the GLOBALIFE Project 
(e.g., Blossfeld et al., 2005; Blossfeld and Timm, 
2003; Drobnic and Blossfeld, 2001; Kurz and 
Blossfeld, 2004). 


6 Data access and documentation 


6.1 Documentation 


Basic information in German and English, 
extensive documentation in German as well 
as lists of publications based on data from 
the GLHS or other work related to the GLHS 
project can be accessed at the following Internet 
sources: 


= The home page of the Max Planck Insti- 
tute for Human Development, Berlin 
(Germany): http://www.mpib-berlin.mpg.de/ 
forschung/bag/projekte/lebensverlaufsstudie/ 
index.htm 

= The homepage of the Center for Research 
On Inequalities and the Life Course at Yale 
University, New Haven (USA): http: //www. 
yale.edu/ciqle/GLHSINDEX.htm 


6.2 Data access 


The data from the surveys of the German 
Life History Study are publicly available for 
scientific research. Requests for the data can 
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be addressed to: Zentralarchiv fiir Empirische 
Sozialforschung, Bachemerstrasse 40, D- 50869 
K6ln (Germany): http://www.gesis.org/za or 
Center for Research on Inequalities and the Life 
Course (CIQLE), Yale University, 140 Prospect 
Street, P.O. Box 208265, New Haven , CT 06250- 
8265: http://www.yale.edu/ciqle or per email 
at: ciqle@yale.edu or sarah.gelo@yale.edu 

Manuals and extensive documentation in 
printed form or as CDs are available on request 
from Redaktion, Max Planck Institute for 
Human Development, Lentzealle 94, D — 14195 
Berlin (Germany). 
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Glossary 


Cohort studies A cohort is a population 
defined by a specific originating event, like 
birth in a given year or graduating from high 
school in a given year. Cohort studies single 
out one or several cohorts and follow them over 
time (prospectively or retrospectively). Cohort 
effects assume that experiences early in life 
and specific to cohorts or significant parts of it 
have consequences throughout later life, such 
as the opportunities restricted by a large cohort 
size (“baby boom”) or the labor market condi- 
tions at career entry. 


Life course By the term “life course” soci- 
ologists denote the sequence of activities or 
states and events in various life-domains which 
span from birth to death. The life course is 
thus seen as the embedding of individual lives 
into social structures primarily in the form of 
their partaking in social positions and roles, 
i.e., with regard to their membership in institu- 
tional orders. The sociological study of the life 
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course, therefore, aims at mapping, describing 
and explaining the synchronic and diachronic 
distribution of individual persons into social 
positions across the lifetime. One major aspect 
of life courses is their internal temporal order- 
ing, ie., the relative duration times in given 
states as well as the age distributions at various 
events or transitions. 


Quantitative life histories Life histories can 
be quantified by distinguishing episodes and 
states, such as being employed or being mar- 
ried, their historical dating and duration. On 
this basis events and transitions can be com- 
puted as survival distributions, hazard rates 
and density functions. Quantitative life histo- 
ries capture the temporal and positional struc- 
ture of life courses. 


Retrospective research design In a retrospec- 
tive research design a well-defined population, 
like a cross-section or birth cohort, is sampled 
and their past history is recorded by relying on 
the recalling of events and activities. Retrospec- 
tive measurement can be forward in time, like 
following an occupational career from the first 
to the present position, or it can be backward in 
time from the time-point of the interview, like 
tracing marriage histories from the present to 
the first spouse. 


References 


Allmendinger, J. (1989). Career mobility dynamics: 
A comparative study of the United States, Norway 
and Germany, Max Planck Institute for Human 
Development, Berlin. 

Allmendinger, J., Briickner, H., and Briickner, E. 
(1993). The production of gender disparities over 
the life course and their effects in old age. Results 
from the West German Life History Study. In: 
A. B. Atkinson, M. Rein (Eds.), Age, Work and 
Social Security, St. Martin’s Press, New York, NY. 

Becker, R., Schédmann, K. (1996). Berufliche 
Weiterbildung und Einkommensentwicklung, 
Kolner Zeitschrift fiir Soziologie and Sozialpsy- 
chologie 48(3), 426-461. 

Belli, R.F. (1998). The structure of autobiographical 
memory and the event history calendar: Poten- 


tial improvements in the quality of retrospective 
reports in surveys, Memory 6(4), 383-406. 

Belli, R.F., Shay, W., and Stafford, F. (1999). 
Computerized event history calendar methods: 
A demonstration of features, functions, and flexi- 
bility. Paper presented at the annual meeting 
of the American Association for Public Opinion 
Research, St. Pete Beach, Florida. 

Belli, R.F., Shay, W., and Stafford, F. (2001). 
Event history calendars and question list surveys. 
A direct comparison of interviewing methods, Pub- 
lic Opinion Quarterly 65(1), 45-74. 

Blossfeld, H-P. (1987a). Kohortendifferenzierung 
und Karriereprozef — Eine Langsschnittstudie tiber 
die Veranderung der Bildungs- und Berufschancen 
im Lebenslauf, Campus, Frankfurt am Main. 

Blossfeld, H-P. (1987b). Zur Reprasentativitat der 
Sfb-3-Lebensverlaufsstudie: ein Vergleich mit 
Daten aus der amtlichen Statistik, Allgemeines 
Statistisches Archiv 71, 126-144. 

Blossfeld, H-P., Klijzing, E., Mills, M., and Kurz, K. 
(2005). Globalization, Uncertainty, and Youth in 
Society, Routledge, New York. 

Blossfeld, H-P., and Timm, A. (2003). Who marries 
whom: Educational systems as marriage markets 
in modern societies, Kluver, Dordrecht. 

Brewer, W.F. (1986). What is autobiographical mem- 
ory? In: D.C. Rubin (Ed.), Autobiographical Mem- 
ory, Cambridge University Press, Cambridge, UK. 

Briickner, EE. (1993). Lebensverlaufe und 
Gesellschaftlicher Wandel Konzeption, Design 
und Methodik der Erhebung von Lebensverlaufen 
der Geburtsjahrgénge 1919-1921, Max-Planck- 
Institut fiir Bildungsforschung, Berlin (materials 
aus des Bildungsforschung 44). 

Briickner, E., and Mayer, K.U. (1998). Collecting 
life history data: Experiences from the German 
Life History Study. In: J.Z. Giele, G.H. Elder 
(Eds.), Methods of Life Course Research: Qualita- 
tive and Quantitative Approaches, Sage Publica- 
tions, Thousand Oaks, CA, pp. 152-181. 

Briickner, H. (1995). People don’t lie, Surveys do? 
An analysis of data quality in a retrospective 
life course study. Materialien aus der Bildungs- 
forschung Nr. 50, Max-Planck-Institut fiir Bil- 
dungsforschung, Berlin. 

Briickner, H., and Mayer, K.U. (1995). Lebensverlaufe 
und Gesellschaftlicher Wandel. Konzeption, 
Design und Methodik der Erhebung von 
Lebensverlaéufen der Geburtsjahrginge 1954— 
1956 und 1959-1961. Max-Planck-Institut fiir 
Bildungsforschung, Berlin (Materialien aus der 
Bildungsforschung 48). 

Briickner, H., and Mayer, K.U. (2005). The 
destandardization of the life course: What it might 


Presented by: https://jafrilibrary.com 


mean and if it means anything whether it actually 
took place, Annual Review of Life Course Research 
205(9), 27-54. 

Carroll, G.R., and Mayer, K.U. (1986). Job-shift pat- 
terns in the Federal Republic of Germany: The 
effects of social class, industrial sector, and orga- 
nizational size, American Sociological Review 51, 
323-341. 

Conway, M.A., and Pleydell-Pearce, C.W. (2000). The 
construction of autobiographical memories in the 
self-memory system, Psychological Review 107(2), 
261-288. 

Diewald, M., Goedicke, A., and Mayer, K.U. (2006). 
After The Fall of the Wall. Life Courses in the 
Transformation of East Germany, Stanford Univer- 
sity Press, Stamford. 

Drobnic, S., and Blossfeld, H-P. (2001). Careers of 
Couples in Contemporary Societies, Oxford Uni- 
versity Press, Oxford. 

Featherman, D.L. (1979). Retrospective Longitudinal 
Research: Methodological Considerations. Journal 
of Economics and Business 32(2), 152-169. 

Goedicke, A., Matthes, B., Lichtwart, B., and 
Mayer, K.U. (2004). Dokumentationshandbuch — 
Ostdeutsche Lebensverléufe im  Transforma- 
tionsprozess. Max-Planck-Institut fiir Bildungs- 
forschung Berlin. 

Graaf, P. de. (1987). Intergenerationale klassenmo- 
biliteit in Nederland tussen 1970 en 1986. Mens 
en Maatschappij 62, 209-221. 

Groves, R.M. (1989). Survey Errors and Survey Costs, 
Wiley, New York. 

Henz, U., and Maas, I. (1995). Chancengleichheit 
durch die Bildungsexpansion. Kolner Zeitschrift 
fiir Soziologie und Sozialpsychologie 47(4), 
605-633. 

Hillmert, S. (2001). Ausbildungssysteme und 
Arbeitsmarkt: Lebensverléufe in GroSbritannien 
und Deutschland im _ Kohortenvergleich. 
Westdeutscher Verlag, Wiesbaden. 

Hillmert, S. (2002). Die Edition von Lebensver- 


laufsdaten -—  Einzelfallpriifungen, Korrek- 
turentscheidungen und ihre Relevantz, 
Max-Planck-Institut fiir Bildungsforschung, 


Berlin. (Projekt Ausbildungs- und _ Berufsver- 
laufe der Geburtskohorten 1964 und 1971 in 
Westdeutchland. Arbeitspapier 20). 

Hillmert, S., and Mayer, K.U. (2004). Geboren 
1964 und 1971, Verlag fiir Sozialwissenschaften, 
Wiesbaden. 

Huinink, J. (1995). Warum noch Familie? Zur 
Attraktivitét von Partnerschaft und Elternschaft in 
unserer, Campus, Gesellschaft. Frankfurt a.M. 


Retrospective longitudinal research 105 


Huinink, J., and Mayer, K.U. et al. (1995a). Kollektiv 
und Eigensinn: Lebensverléufe in der DDR und 
ach. Akademie-Verlag, Berlin. 

Huinink, J., and Mayer, K.U. (1995b). Gender, social 
inequality, and family formation in West Ger- 
many. In: K.O. Mason, A-M. Jensen (Eds.), Gender 
and Family Change in Industrialized Countries, 
pp. 168-199. Clarendon Press, Oxford. 

Jacob, M. (2004). Mehrfachausbildung in Deutsch- 
land: Karriere, Collage, Kompensation? VS - Verlag 
fiir Sozialwissenschaften, Wiesbaden. 

Jonsson, J.O., and Mills, C. (2002). From Craddle 
to Grave: Life-course change in modern Sweden. 
Sociologypress, Durham. 

Kurz, K., and Blossfeld, H-P. (2004). Home Owner- 
ship and Social Inequality in Comparative Perspec- 
tive. Stanford University Press, Stanford. 

Loftus, E., and Marburger, W. (1983). Since the erup- 
tion of Mt. St. Helens, has anyone beaten you up? 
Improving the accuracy of retrospective reports 
with landmark events. Memory & Cognition 11(2), 
114-120. 

Mach, B.W. (2003). Pokolenie historycznej nadziei 
i codziennego ryzyka. Spoleczne losy osiemnasto- 
latkow zroku 1989 [Generation of Historic Hope 
and Everyday Risk. Social Trajectories of Eighteen- 
year Olds from the Year 1989]. Institute of Polit- 
ical Studies of the Polish Academy of Sciences, 
Warsaw. 

Mayer, K.U. (1990). Lebensverlaéufe und sozialer 
Wandel: Anmerkungen zu einem Forschungspro- 
gramm. In K.U. Mayer (Ed.), Lebensverlaufe und 
sozialer Wandel. K6élner Zeitschrift fiir Soziolo- 
gie und Sozialpsychologie: Sonderhefte 31, 7-21. 
Westdeutscher Verlag, Opladen. 

Mayer, K.U. (2000). Promises fulfilled?: A review 
of 20 years of life course research, Archives 
Européennes de Sociologie 41, 259-282. 

Mayer, K.U. (2001). The paradox of global social 
change and national path dependencies: life course 
patterns in advanced societies. In: A. Woodward & 
M. Kohli (Eds.), Inclusions and Exclusions in Euro- 
pean Societies, Routledge, London, pp. 89-110. 

Mayer, K.U. (2004). Whose lives? How history, 
societies, and institutions define and shape life 
courses, Research in Human Development 1(3), 
161-187. 

Mayer, K.U., and Aisenbrey, S. (forthcoming). Varia- 
tions on a major theme — Trends in social mobil- 
ity in (West-) Germany for cohorts born between 
1919 and 1971. In: M. Gang] und S. Scherer (Hrsg.), 
Festschrift fiir Walter Miiller, Campus Verlag, 
Frankfurt. 

Mayer, K.U., and Baltes, P.B. (Eds.). (1996). Die 
Berliner Altersstudie, Akademie Verlag, Berlin. 


106 Handbook of LongitudihdP Regaa RY: https:/jafrilibrary.com 


Mayer, K.U., and Briickner, E. (1989). Lebensver- 
laufe und Wohlfahrtsentwicklung: Konzeption, 
Design und Methodik der Erhebung von 
Lebensverlaufen der Geburtsjahrgaénge 1929-1931, 
1939-1941, 1949-1951. Max-Planck-Institut fur 
Bildungsforschung, Berlin (Materialien aus der 
Bildungsforschung 35). 

Mayer, K.U., and Carroll, G.R. (1987). Jobs and 
classes: structural constraints on career mobility. 
European Sociological Review 3, 14-38. 

Mayer, K.U., and Huinink, J. (1990). Age, period, and 
cohort in the study of the life course: A compari- 
son of classical A-P-C-analysis with event history 
analysis or farewell to Lexis?. In: D. Magnusson, 
L.R. Bergman (Eds.), Data Quality in Longitudinal 
Research, Cambridge University Press, Cambridge. 

Mayer, K.U., and Miiller, W. (1986). The state and 
the structure of the life course. In: A.B. Sorensen, 
F.E. Weinert, L.R. Sherrod (Eds.), Human Develop- 
ment and the Life Course: Multidisciplinary Per- 
spectives. Erlbaum, Hillsdale, NJ, pp. 217-245. 

Mayer, K.U., and Solga, H. (1994). Mobilitat und 
Legitimitat: zum Vergleich der Chancenstruk- 
turen in der alten DDR und der alten BRD 
oder: Haben Mobilitaétschancen zu Stabilitat und 
Zusammenbruch der DDR beigetragen?, K6lner 
Zeitschrift fiir Soziologie und Sozialpsychologie 
46, 193-208. 

Pollmann-Schult, M., and Mayer, K.U. (2004). 
Returns to skills: Vocational training in Germany 
1935-2000, Yale Journal of Sociology 4, 73-98. 

Reimer, M. (2001). Die Zuverlassigkeit des auto- 
biographischen Gedachtnisses und die Valid- 
itaét retrospektiv erhobener Lebensverlaufsdaten. 
Kognitive und erhebungspragmatische Aspekte. 
Max-Planck-Institut fiir Bildungsforschung, Mate- 
rialien aus der Bildungsforschung Nr. 71, 
Berlin. 

Reimer, M. (2005a). Autobiografisches Gediachtnis 
und retrospektive Datenerhebung. Die Rekonstruk- 
tion und Validitét von Lebensverlaufen, Max- 
Planck-Institut fiir Bildungsforschung, Studien 
und Berichte Nr. 70, Berlin. 

Reimer, M. (2005b). Collecting Event History Data 
About Work Careers Retrospectively: Mistakes that 
Occur and Ways to Prevent Them. Max Planck 
Institute for Human Development, Berlin. 


Reimer, M., Matthes, B., and Pape, S. (forthcoming). 
Collecting Event Histories with True Tales: Tech- 
niques to Improve Autobiographical Recall Prob- 
lems in Standardized Interviews, Quality and 
Quantity. 

Rubin, D.C. (Ed.) (1986). Autobiographical Memory. 
Cambridge University Press, Cambridge. 

Rubin, D.C. (Ed.) (1996). Remembering Our Past: 
Studies in Autobiographical Memory. Cambridge 
University Press, Cambridge. 

Schwarz, N., and Sudman, S. (Eds.) (1996). Answer- 
ing Questions: Methodology for Determining Cog- 
nitive and Communicative Processes in Survey 
Research, Jossey-Bass, San Francisco. 

Solga, H. (1995). Auf dem Weg in eine klassenlose 
Gesellschaft?: Klassenlagen und Mobilitaét zwis- 
chen Generationen in der DDR. Akademie-Verlag, 
Berlin. 

Solga, H. (1996). Lebensverlaéufe und Historischer 
Wandel in der Ehemaligen DDR. ZA-Information 
38, 28-38. 

Solga, H. (2001). Longitudinal surveys and the study 
of occupational mobility: Panel and retrospective 
design in comparison, Quality and Quantity 35(3), 
291-309. 

Solga, H. (2005). Ohne Abschluss in die Bil- 
dungsgesellschaft: die Erwerbschancen § gering 
qualifizierter Personen in soziologischer und 
Okonomischer Perspektive, Barbara Budrich, 
Opladen. 

Sudman, S., Bradburn, N.M. (1996). Thinking About 
Answers: The Application of Cognitive Processes 
to Survey Methodology, Jossey-Bass Publishers, 
San Francisco. 

Trappe, H. (1995). Emanzipation oder Zwang?: 
Frauen in der DDR zwischen Beruf, Familie und 
Sozialpolitik, Akademie-Verlag, Berlin. 

Wagner, M. (1989). Raéumliche Mobilitét im 
Lebensverlauf. Enke, Stuttgart. 

Wagner, M. (1996) Lebensverlaufe und 
Gesellschaftlicher Wandel. Die Westdeutschen 
Teilstudies. ZA-Information 38, 20-27. 

Wehner, S. (2002). Exploring Trends and Patterns 
of Nonresponse: Evidence from the German Life 
History Study. Essex Summer School in Social Sci- 
ence Data Analysis and Collection, University of 
Essex, Colchester. 


Presented by: https://jafrilibrary.com 
Part II 


Measurement Issues in Longitudinal 
Research 


Presented by: https://jafrilibrary.com 


This page intentionally left blank 


Presented by: https://jafrilibrary.com 


Chapter 7 ! 


Respondent recall 
Jennifer K. Grotpeter 


1 Introduction 


How did we become the people we are today? 
An interest in human development is as old 
as humanity itself, and the scientific study of 
human development began with the publica- 
tion of Charles Darwin’s Origin of the Species 
in 1859. Since then, the evidence used to 
support developmental theories has evolved 
from field observation notes to complex lon- 
gitudinal studies following infants and chil- 
dren into adolescence and adulthood. As the 
life course perspective of human development 
has progressed since the 1960s, researchers 
have become increasingly aware that our lives 
are individually shaped by the timing and 
sequencing of our life events (e.g., Elder and 
O’Rand, 1995). 

In order to conduct scientific inquiry assess- 
ing those life events, clearly researchers must 
be able to measure study participants’ char- 
acteristics, attitudes, and behavior through- 
out the lifespan. Because it is impossible to 
follow study participants from birth to adult- 
hood, recording their every thought and behav- 
ior, along with everything that happens to them, 
researchers must rely on asking participants 
about their thoughts, behaviors, and the events 
that happened to them over some period of 
time. Whether this time period is the past 
24 hours, the past year, or the past 20 years, 
all developmental and life course research is 


dependent upon the ability of its subjects to 
accurately recall what they have previously 
thought and done. This chapter will explore 
issues in respondent recall in longitudinal sur- 
vey research, focusing particularly on issues of 
reliability and validity in short-term and long- 
term retrospective recall. 


2 Respondent recall — issues of 
memory 


Research on memory indicates that we are 
unlikely to completely forget things that 
we have directly experienced, though we 
may experience problems with recalling these 
events accurately (Fowler, 1998). Megan 
Beckett and colleagues, in 2001, reviewed the 
literature on error in respondent recall, dis- 
tinguishing between retrieval of autobiographi- 
cal memories, recall strategies and recall error, 
and respondent characteristics associated with 
recall error. The following discussion of recall 
error is based on the framework used by 
Beckett, et al. (2001), and is supplemented by 
other reviews. 


2.1 Retrieval of autobiographical memories 


In their review, Beckett and colleagues (2001) 
found that researchers have identified four 
steps in the reporting process, which is the 
same for retrieval of past events, current sta- 
tus reports, and attitudinal questions: (1) the 
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respondent interprets the question; (2) the 
respondent retrieves the answer to the question 
or information needed to construct an answer; 
(3) the respondent formulates an answer based 
on the recalled information; and (4) the respon- 
dent edits the answer and decides whether 
and how to respond. Most research on recall 
error focuses on the second part of the pro- 
cess, memory retrieval. The most obvious recall 
problem, “forgetting,” occurs when events were 
never processed, encoded, or stored into long- 
term memory in the first place. Alternatively, 
they argue that the major event may be firmly 
encoded into memory, but the finer details may 
not be easily recalled and are gradually lost over 
time. 

The second recall issue that Beckett and col- 
leagues (2001) discussed is that the event may 
not be recalled due to retrieval error, which 
means that the event is in the memory, but 
because too much time has passed since it was 
“rehearsed,” (i.e., it needed to be remembered 
or thought or talked about), it was too difficult 
to access the memory. Birthdates and wedding 
anniversaries are generally rehearsed regularly, 
whereas the date one had coffee with a friend is 
not. The mental processes involved in rehearsal 
are thought to strengthen the pathway to the 
memory, which in turn increases the ease of 
later recall. Third, the review found that recall 
problems may be due to the inaccurate recon- 
struction of a memory, due to other similar 
events “overwriting” the initial memory. 

In reviews, Fowler (1998), Charles Pier- 
ret (2001), and Beckett and colleagues (2001) 
assert that another threat to accurate recall is 
the length of the recall period (i.e., the time 
between the event and the time it is reported). 
A longer recall period increases the quantity 
of information the respondent must retrieve 
from memory, making it more difficult to do 
so, and the high quantity of information in turn 
increases the likelihood that the respondent 
will be unable to distinguish between various 
events, confusing details between two events or 


forgetting one event entirely (Pierret, 2001). In 
their First and Second Malaysian Family Life 
Surveys (MFLS-1 and MFLS-2), Beckett and 
colleagues (2001) studied ever-married women 
of childbearing age and their husbands. One 
sample was drawn for the first study and 
then 72% of the respondents in that sample 
were interviewed at the second time, at which 
time a second sample was newly drawn and 
the respondents were given retrospective inter- 
views on topics such as breastfeeding their chil- 
dren. Results indicated that data quality did 
deteriorate with the length of the recall period. 
They reported that some events were forgotten 
and details about other events were “blurred,” 
resulting in incomplete or inexact reports. 

The time of recall is not simply a linear issue, 
however. Fowler (1998) notes that the greater 
the impact or current salience of the event, the 
more likely it is to be recalled. Similarly, in 
their literature review, Beckett and colleagues 
(2001) found that the rate of decline in recall 
ability over time varied by type of event. That 
is, recall of some types of events that were more 
infrequent and salient, such as annual physi- 
cals and reporting of robberies, did not decay 
over time, but that reports of assaults, burglar- 
ies, and larcenies did suffer from memory decay 
over time. They note in particular that in these 
instances, events in the distant past are prone to 
suffer from “heaping,” meaning that if respon- 
dents could not recall an actual value, they 
provided a “prototypical” response nearest the 
actual value, which results in an artificially 
high number for the given response period. 

Though salience may seem to be related to 
the rehearsal issue discussed above, it is dis- 
tinct in that salience depends upon the relation- 
ship of the event to the respondent’s lifestyle 
or self-identity. For example, Beckett cites her 
own 2000 study of older Taiwanese, in which 
the most important predictor that a health con- 
dition would be reported at both time peri- 
ods studied was if that condition strongly 
affected their daily lives, such that it affected 
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walking or bathing (Beckett, Weinstein, and 
Yu-Hsuan, 2000). Beckett, et al. (2001) also 
found using their Malaysian data that among 
men and women, moves that were associated 
with highly salient life events (e.g., new jobs for 
men or marriage for women) were more likely 
to be consistently reported 12 years later. 

A third recall problem is referred to as tele- 
scoping (Beckett, et al., 2001; Pierret, 2001). 
Telescoping is said to occur when more events 
are recalled as having occurred in the most 
recent period and fewer events in the more dis- 
tant past. Three factors Beckett and colleagues 
(2001) report as contributing to telescoping are 
that (1) normal forgetting is greater for events 
that occurred in the more distant past; (2) errors 
in dating events that may otherwise be random 
or unbiased increase over time; and (3) events 
that occurred outside the reporting period may 
intrude upon the recall of events in the report- 
ing period (i.e., respondents may be asked about 
the number of times they argued with their 
spouse in the past six months, but they may 
include arguments that occurred between six 
and twelve months ago). The second and third 
factors discussed above may then lead to over- 
estimation of the number of events in a recent 
reporting period. In their 2001 Malaysian study, 
Beckett confirmed that there was a slight tele- 
scoping of events in the second survey com- 
pared with reports of the same events in the 
first survey 12 years earlier. A similar issue dis- 
cussed in the review is referred to as the “avail- 
ability heuristic,” or “availability principle.” 
This term indicates that if an event is easily 
recalled, the respondent may believe it hap- 
pened with more frequency than if the event 
were more difficult to recall. 

Finally, when a respondent is unable to recall 
events clearly, several strategies are regularly 
used to aid in reconstructing the events, but 
some of these strategies themselves may lead to 
reporting bias (Beckett, et al., 2001). One com- 
mon strategy is to probe for individual instances 
of the event, but this may lead to “heaping.” 
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Another strategy is to use information about 
current attitudes or behavior to infer past atti- 
tudes or behavior, but this results in “stability 
bias,” which artificially diminishes potential 
changes in attitudes and behavior over time. 

Fowler (1998) states that the more consis- 
tent an event was with the way the respondent 
thinks about things, the more likely it is to be 
recalled. While retrospective reports were gen- 
erally found to be strongly influenced by cur- 
rent behaviors, Beckett and colleagues (2001) 
also found that in some instances, retrospec- 
tive reports may be more accurate than con- 
current reports when the issues are particularly 
sensitive, such as heavy drinking—respondents 
were more willing to acknowledge such events 
as having occurred in the distant past rather 
than as a current embarrassing behavior. In the 
Malaysian study, Beckett, et al. (2001) found 
that even reports about less socially sensitive 
behaviors, such as breastfeeding, were respon- 
sive to social norms similarly to more sensitive 
topics (e.g., abortion or illegal drug use). 

Finally, Beckett, et al. (2001) discuss that 
certain sociodemographic subgroups, such as 
better-educated women, may be better able to 
recall past use of health care, and that this was 
more true for events in the more distant past 
than the more recent past. In their Malaysian 
studies, Beckett, et al. (2001) found that a cor- 
relate of data quality, such as education, in 
retrospective reports on one topic (e.g., details 
about a birth) is likely to be predictive of quality 
issues related to another topic (e.g., migration 
histories). It is also possible in a longitudi- 
nal survey that respondents become aware over 
the years that the more questions to which 
they respond affirmatively, the more follow-up 
questions they will be asked. They may elect 
to reduce their effort and not report certain 
events in order to be spared being asked addi- 
tional questions. Further discussion of this phe- 
nomenon of panel conditioning may be found 
in the next chapter (Cantor, Chapter 8 in this 
volume). 
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Over thirty-five years ago, Marian Radke- 
Yarrow and _ colleagues (Radke-Yarrow, 
Campbell, and Burton, 1970) compared con- 
temporaneous and retrospective accounts of 
childhood behavior and experiences, and 
the results revealed biases and distortions 
that occur with research based on retro- 
spective recall. In 1998, Sir Michael Rutter 
and colleagues (Rutter, Maughan, Pickles, 
and Simonoff, 1998) conducted a follow-up 
literature review of studies that followed 
Radke-Yarrow’s work. In their review of litera- 
ture on memory, Rutter and colleagues (1989) 
discuss the extent to which, assuming that 
accurate memories are available in the mind for 
recall, the reconstruction of memory introduces 
systematic bias. In particular, it is argued that 
people use “implicit theories” (personal mean- 
ing about specific phenomena) to reconstruct 
their personal histories, and there is strong 
evidence that reports of past behavior may be 
distorted to better reflect current feelings and 
attitudes. However, evidence is also presented 
that with skillful interviewing some of these 
biases may be overcome, and that distortion of 
memory is not necessarily inevitable. 


3 Respondent recall - issues in 
longitudinal research design 


Information gathered at multiple points in time 
is crucial for any research that aims to pre- 
dict future behavior based on past behavior and 
beliefs. Thus, multiple measurement points are 
necessities for studies of lifespan development 
studies that are designed to assess develop- 
mental pathways from childhood to adulthood, 
and studies that require lifetime estimates of 
conditions, such as psychopathology (Rutter, 
Maughan, Pickles, and Simonoff, 1998). There 
are two primary ways in which researchers can 
assemble longitudinal datasets — through the 
use of prospective (longitudinal) panel designs, 
and through the use of cross-sectional retro- 
spective designs. Though both designs require 


respondents to recall back some amount of time 
in order to generate a response, they each neces- 
sitate substantially different lengths of recall 
time. 


3.1 Prospective panel designs 


Many of the predictive antecedents of later 
behaviors or events in people’s lives are best 
measured at the time they occur. Thus, a lon- 
gitudinal design that follows a sample over 
time, concurrently measuring beliefs, events, 
and behavior, is ideal for assessment using a life 
course perspective. In prospective longitudinal, 
or panel, research, a subject pool is identified 
and baseline interview data is gathered. This 
same subject pool is tracked over time and is 
then re-contacted and interviewed at specified 
intervals. At the conclusion of the study, opti- 
mally, data will be available for every subject at 
each point in time, so that potential changes can 
be examined over time, with relatively minimal 
reliance on the extreme long-term memories of 
the participants. 


3.2 Cross-sectional retrospective designs 


The second way that researchers can assem- 
ble a type of longitudinal dataset is by using 
a cross-sectional retrospective design, which is 
designed to significantly shorten the amount 
of time required to gather longitudinal data, 
as well as to conserve financial and human 
resources. As the name implies, studies using 
this method combine a cross-sectional design 
with retrospective recall in order to produce 
longitudinal data. Thus, respondents are sur- 
veyed at one point in time and are asked to 
report about current and past events, behavior, 
and beliefs. For example, adults may be asked 
to think back to childhood and to report on 
their peer relationships or their mother’s par- 
enting style, or they may be asked to report 
the age at which they first drank alcohol, 
smoked cigarettes, or engaged in sexual behav- 
ior. Respondents may be asked to recall back for 
a length of time up to several decades (e.g., the 
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Health and Retirement Study). As a result, in 
one wave of data collection, a researcher gains 
access to data that refers to multiple points in 
the respondents’ lives, potentially over a very 
long time span, that would otherwise take many 
years and a great deal of financial and human 
resources to collect. 

It is important to note that researchers 
must consider respondent recall issues even 
in prospective research because these studies 
gather data on time periods ranging from the 
past few months (e.g., Survey of Income and 
Program Participation), to the past year (the 
Panel Study of Income Dynamics, the National 
Youth Survey), to several years (e.g., Wisconsin 
Longitudinal Study). The important question is, 
what is the length of time over which events can 
be recalled with reasonable reliability? Scott 
and Alwin (1998) argue that the optimal design 
for gathering life history data is one that com- 
bines the best features of prospective and retro- 
spective measurement. As a result, understand- 
ing the issues in respondent recall is crucial 
for all longitudinal researchers, including those 
who conduct prospective research, and for any- 
one desiring to comprehend the ramifications 
of the results of that research. 


3.3 Benefits and limitations — prospective 
designs 


Jacqueline Scott and Duane Alwin (1998) state 
the following benefits of prospective designs: 
(1) they allow for data to be collected con- 
currently with the events in question; (2) they 
allow for the continuous measurement of events 
and changes that would be too burdensome in 
retrospective studies; (3) it is much easier for 
researchers to impose theoretically driven def- 
initions of life events that are ambiguous in 
practice, such as “leaving home”; (4) they pro- 
vide an opportunity to collect prospectively ori- 
ented data concerning individual aspirations 
and expectations and to compare these to actual 
outcomes at later points in time; and (5) they 
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provide data on the interaction of individ- 
ual life trajectories within the household or 
family context, whereas retrospective designs 
would be limited to only those households that 
survive. 

While the prospective panel research design 
is the ideal design for measuring change over 
time, there are some significant limitations. 
First, as Scott and Alwin (1998) argue, statis- 
tical problems accrue because of subject attri- 
tion or non-response, and this attrition may not 
be random. Second, it is possible that inter- 
viewing the same people over time affects the 
essence and quality of respondents’ answers. 
This concern with panel conditioning is some- 
what similar to the “Hawthorne effect” whereby 
the act of making people subjects on a social 
experiment affects their subsequent behavior 
and makes them less “typical” (Scott and 
Alwin, 1998). 

Additionally, there are several practical lim- 
itations to prospective panel studies. First, 
these designs require substantial investments 
of financial and human resources, and they 
require a great deal of time (i.e., from years 
to several decades). Second, if the study con- 
tinues a decade or more, it is likely that new 
constructs will be identified and old constructs 
will change within the research field, and it 
is impossible to go back in time to the first 
year of the study and change or add questions 
to earlier surveys. Similarly, because it is the 
ideal in a prospective panel study for individual 
questions to remain constant across the waves 
of the study so that changes in responses to 
those questions can be measured over time, it is 
rare that a researcher would sacrifice this con- 
sistency to change the wording of a question, 
even if the reason were to improve a poorly 
worded question or to follow new develop- 
ments in research. Despite these limitations, 
prospective designs remain the most reliable 
and valid way to amass longitudinal data that 
can be used to assess temporality and causal 
relationships. 
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3.4 Benefits and limitations — retrospective 
designs 


Retrospective research designs have two prac- 
tical benefits over prospective designs. First, 
financially, it is far less expensive to survey 
adult respondents at one time point to answer 
questions about their life experiences over the 
past twenty years than it is to identify, track, 
and survey the same people from childhood 
through adulthood to ask them the same ques- 
tions every year. Second, temporally, obtaining 
longitudinal results using a one time retrospec- 
tive interview takes only as much time as it 
takes to conduct the interview and analyze 
the data, compared with the years (and even 
decades) required before the data reward of a 
prospective longitudinal study is available. 

The most obvious drawback to retrospective 
research designs is that this method relies heav- 
ily on the accuracy of respondent recall, which 
is affected both by memory recall issues and 
the way in which they filter their memories 
based on current beliefs. Additionally, these 
recall issues may negatively impact the recall of 
some types of events and behaviors more than 
others, and so it is important for the researcher 
considering this type of research design to be 
aware of the type of data they intend to col- 
lect and the degree to which it is susceptible 
to recall problems. Another drawback of ret- 
rospective designs is that they can result in 
selection biases because only survivors can be 
interviewed (Scott and Alwin, 1998). Finally, 
Scott and Alwin (1998) argue that recent expe- 
riences and events may bias the recollections 
people make about their earlier experiences, 
making inferences about trends or causation 
somewhat circular. 

Thus it is clear that with unlimited financial, 
temporal, and human resources, it is gener- 
ally better to collect information from indi- 
viduals prospectively, i.e., “to gather data on 
people’s lives as they are living them” (Scott 
and Alwin, 1998). Despite this, because of lim- 
ited resources, much criminological research 


makes use of retrospective recall in cross- 
sectional research designs, and it is impor- 
tant to understand the degree to which these 
data can provide accurate measures of the 
topics in question. “The challenge to those 
designing the optimal longitudinal measure- 
ment design, given considerations of cost and 
data quality, is to find the acceptable limits 
for gathering data retrospectively” (Scott and 
Alwin, 1998). 


4 Research evidence 


The following section contains brief 
descriptions of a variety of studies that used 
prospective or a combination of prospective 
and retrospective research designs, and raise 
or address issues in recall. 

To address the debate about the utility of 
longitudinal compared to _ cross-sectional 
research designs, Scott Menard and Delbert 
Elliott (1990) examined empirical evidence on 
the extent to which longitudinal and cross- 
sectional data could be used interchangeably 
without affecting substantive conclusions. Data 
were taken from the National Youth Survey 
(NYS), a nationally representative sample of 
adolescents who were aged 11 to 17 when the 
sample was drawn in 1976. The NYS used 
annual assessments for the first five years of 
the study. The next two assessments occurred 
three years after the last annual assessment, 
and then again three years after that. Though 
the focus for all seven waves of the study was 
on the previous year, at wave six some retro- 
spective data were also collected for the other 
two intervening years, which constituted two- 
and three-year recall periods. This allowed 
analysis of delinquency and drug use variables 
for a single wave (i.e., cross-sectional data) 
and also age-specific estimates for each birth 
cohort over multiple waves (i.e., longitudinal 
data collection). Results based on two different 
analyses comparing prospective data with 
short-term retrospective data indicated that in 
comparison with the prospective longitudinal 
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panel, the retrospective cross-sectional data 
collection that involved extended recall periods 
significantly underestimated the prevalence of 
delinquency and drug use. This underestima- 
tion was more serious for general delinquency 
and Index offenses than it was for marijuana 
use or polydrug use. 

Over longer periods, the problem of compar- 
ing long-term retrospective data with longitu- 
dinal data appeared to be even more severe. In 
the seventh wave, the respondents were asked 
about whether they had ever engaged in a series 
of offenses, including the Index offense scale, 
and if they had, at what age had they first 
committed the offenses. Of those who reported 
ever having committed an offense on either 
the prospective or the retrospective questions, 
about 10% failed to do so on the prospective 
surveys, and more than half failed to do so 
on the retrospective surveys. In two-thirds of 
the cases in which an offense was reported on 
either of the two measures, the two measures 
disagreed about whether or not an offense had 
ever been committed. In an additional 11% of 
these cases for which an offense was reported 
on either of the two measures, the two mea- 
sures disagreed on the age at which the respon- 
dent first committed the offense. Overall, it was 
only in one-fourth of the cases that prospec- 
tive and retrospective accounts agreed that an 
offense was committed at a particular age. The 
data used in this study demonstrated clearly 
that longitudinal data collected using cross- 
sectional methods with extended recall periods 
may produce very different results from longi- 
tudinal panel data collected prospectively. 

Kevin Weinfurt and Patricia Bush (1996) 
observe that there is an assumption in longitu- 
dinal studies of childhood substance use that 
repeated measurement of a particular variable 
for a particular child will be “reliable and non- 
contradictory” across sampling waves. How- 
ever, they note that such assumptions are rarely 
examined, except perhaps as internal consis- 
tency in cross-sectional designs. The authors 


Respondent recall 115 


propose that, alternatively, external consis- 
tency, which is the change in response pat- 
terns over time, should be examined. Using 
a framework presented by Bailey, Flewelling, 
and Rachal (1992), they examined logical errors 
(errors in which the respondent indicates 
engagement in some behavior at an earlier time 
point and no ever-engagement in that behav- 
ior at a later time point) and estimation errors 
(errors in which a respondent indicates less 
ever-engagement at a later time period than they 
indicated at an earlier time period) committed 
by elementary and junior high school students 
who were administered drug use surveys every 
year for four years. Students were asked to indi- 
cate the number of times they had more than 
one puff of a tobacco cigarette, more than a 
puff of marijuana, or more than a sip of alco- 
hol. It was unclear what the time bounds were 
for these responses (e.g., in the last month, six 
months, or past year). 

Logical errors tended to decrease for all sub- 
stances as the students became older, while the 
percentage of estimation errors remained sta- 
ble (alcohol and cigarettes) or increased slightly 
(marijuana). As a result, the ratio of logical 
to estimation errors decreased across the sur- 
vey years. Both logical and estimation errors 
were found for every substance in every lag 
of the study, but whereas estimation errors 
remained relatively stable over time, logical 
errors decreased for all substances as students 
got older. Interestingly, the logical errors for 
marijuana use were quite high in the first two 
years of the study, indicating that over half of 
students who reported marijuana use while in 
fourth or fifth grade reported no use the fol- 
lowing year, which would indicate, if it were 
the case that student reports become more accu- 
rate as they get older, that nearly half of mar- 
ijuana use reported in fourth and fifth grades 
was a false positive report. Apparently, counter- 
intuitively, shorter time lag between responses 
resulted in worse errors than longer time lag. 
However, the authors note that any estimation 
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of error that involved Time 1, or the youngest 
respondents, resulted in the greatest errors, and 
so it is confounded with age of the subjects. 

Deborah Freedman and colleagues 
(Freedman, Thornton, Camburn, Alwin, and 
Young-DeMarco, 1988) used a life history 
calendar (LHC, detailed later in this chapter) in 
order to aid respondent recall. These calendars 
were used in a validation study with 900 
23-year-olds who were asked to report about 
the nine years since their fifteenth birthdays. 
Data from initial interviews from one _ par- 
ticular month in 1980 were compared with 
retrospective recall from 1985 of that same 
month in 1980. The authors reported that of 
the 900 LHCs, only four calendars had months 
with no data, and the 1985 retrospective data 
corresponded highly with the 1980 interview 
data and interviewers evaluated the procedure 
favorably. Overall, those activities with a 
relatively high level of volatility, such as 
employment activity, were least consistent 
between the two reporting periods, and thus 
the authors note that researchers should be 
aware the highly variable events will likely 
be measured less accurately than more stable 
events. (On a related note, Henry, Moffitt, 
Caspi, Langley, and Silva (1994) found that 
retrospective measures of psychological or 
attitudinal variables were less consistent with 
prospective data than were measurements of 
more objective characteristics such as changes 
in residence or police contacts.) A cautionary 
note, however, is sounded by Taris (2000), who 
concluded after a review of research on the 
LHC that the LHC improved information only 
some of the time and for some but not other 
variables. 

In order to examine the quality of informa- 
tion that could be gathered on lifecycle events 
in one retrospective survey, H. Elizabeth Peters 
(1988) conducted a study that compared data 
from a retrospective marital history with those 
derived for the same individuals from panel 
information. Data were taken from the Young 


Women’s cohort of the National Longitudinal 
Surveys of Labor Market Experience (NLS). In 
1978, respondents were asked about the dates 
of past marital events (i.e., marriage, divorce, 
remarriage) and husbands’ characteristics (for 
verification) for each marriage, and these ret- 
rospective marital histories were updated five 
years later. Meanwhile, approximately annual 
panel information was also available with 
similar survey questions (i.e., current marital 
status). The authors found that this panel infor- 
mation sometimes yielded results that were 
difficult to interpret, because it was not neces- 
sarily clear if more than one event of short dura- 
tion had occurred in one year (e.g., the subject 
was married in one year, and in the next year 
they divorced and remarried, but in both assess- 
ments they were coded as “married”). Thus, 
though the retrospective responses were more 
prone to memory error, these data tended to be 
more complete. 

Results indicated that when a marital event 
was reported in both the retrospective and 
panel sources, there was substantial agreement 
about the dates of the event, and when there 
were errors, they appeared to be related to dif- 
ficulty of recall in the retrospective histories. 
Specifically, there was very little discrepancy 
between the reports of age at first marriage 
between the retrospective and panel sources; 
however, subsequent divorce and remarriage 
analysis, while qualitatively similar, yielded 
less precise parameter estimates. When the time 
between the date of the event reported in the 
retrospective survey and the date of the actual 
survey was greater, there was a substantially 
greater likelihood of inconsistent information. 
The authors argue that those characteristics that 
vary over time, such as labor force participa- 
tion, experience, and earnings, are usually not 
reported in retrospective surveys because infor- 
mation about the timing and amounts would be 
much less reliable than marital histories. 

In 2001, Charles Pierret published an anal- 
ysis comparing annual and biennial responses 
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to the National Longitudinal Survey of Youth 
1979 (NLSY79). The NLSY79 made this change 
in 1994, and at this time asked respondents 
to report on 1992 despite the fact that they 
had already completed interviews on 1992 in 
1993. The NLSY79 is a panel study of men and 
women who were aged 14 to 21 in 1978, assess- 
ing a variety of topics including schooling, 
employment, health, marriage, fertility, income, 
program participation, and crime and illicit 
activities. By increasing the reporting period 
to two years, the researchers anticipated sev- 
eral threats to the accuracy of the reports. 
There was concern that with a longer report- 
ing period, it would be more likely that recall 
would decay, and that there would be more 
events to recall. Finally, there was concern that 
similar or related events would occur during the 
reference period, causing difficulties in distin- 
guishing between the events, leading to omis- 
sion of an event or confusion between multiple 
events. 

On their face results indicated that the switch 
from annual to biennial reporting would have 
only moderate impacts on event history data. 
Overall, receiving food stamps and AFDC (Aid 
to Families with Dependent Children, i.e., wel- 
fare) was somewhat under-reported, and bene- 
fits were moderately over-reported in the longer 
reporting periods. However, at the individ- 
ual level, discrepancies were cause for greater 
concern. Over two-thirds of AFDC and food 
stamp recipients reported a different assistance 
receiving history when asked after one ver- 
sus two years’ time. Additionally, one out of 
three new employers were not reported, and 
new dates were given for fully half of all dates 
reported. 


5 Research techniques for 
improving respondent recall 


It seems evident that the best way to optimize 
the accuracy of respondent recall is to min- 
imize the length of time between the events 
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to be recalled and the time of the interview. 
The maximum recall time that can be expected 
to yield reliable, valid results varies by the 
level of detail that must be recalled. For exam- 
ple, researchers who assess dietary intake have 
found that even a 24-hour recall period can 
result in reporting error due to inaccurate recall 
of food consumed (Fowler, 1998). Alterna- 
tively, given that even the most highly regarded 
prospective longitudinal studies of substance 
abuse and delinquency involve annual assess- 
ments, it is important to maximize report- 
ing accuracy by using techniques to improve 
recall. The following are techniques which 
have been suggested to improve respondent 
recall: 


1. Ask long, rather than short questions. Fowler 
(1998) notes that this does not mean asking 
convoluted questions, but instead including 
introductory material that will prepare the 
respondent for the question. In addition to 
preparing the respondent for the topic area 
of the question to follow, this increases the 
time that the respondents have to search their 
memories. 

2. First ask a summary question (Beckett, et al., 
2001). Details on sensitive topics may be bet- 
ter recalled if the respondent is first asked 
one or more summary questions. For exam- 
ple, Beckett reports that in the MFLS-2, 
rather than immediately asking respondents 
specific questions about all pregnancies they 
have ever had, respondents were first asked 
about the total number of pregnancies of liv- 
ing and deceased children they have had. 
Similarly, in the National Youth Survey, 
respondents are first asked about the total 
number of children they have before they are 
asked any specific details about their chil- 
dren. 

3. Ask multiple-related questions, which will 
improve the probability that an event will 
be recalled and reported (Fowler, 1998; Rut- 
ter, et al., 1998). This is effective in part 
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because it stimulates associations with what 
the respondent is supposed to report. 

. Train interviewers to ask alternative ques- 
tions if a respondent is unable to provide 
a precise response (Beckett, et al., 2001; 
Freedman, et al., 1988). Data quality for 
events in the more distant past may be 
improved by asking alternative questions 
when necessary. For example, if a respon- 
dent cannot recall an exact date, a follow-up 
question about the respondent’s age at the 
time of the event, or asking for an approxi- 
mate answer if a specific answer cannot be 
recalled, may yield a usable answer. 

. Similarly, train interviewers to prompt 
with consistent follow-up questions (Beckett, 
et al., 2001; Rutter, et al., 1998; Freedman, 
et al., 1988). Beckett, et al. (2001) argue 
that well-trained interviewers may be able 
to elicit information about underreported 
events. Specifically, they note that MFLS 
interviewers were trained to probe fur- 
ther about “miscarriage, induced abortion, 
missed episodes of contraceptive use, or 
spousal separation if the period between 
births was greater than four years.” Addi- 
tionally, Rutter, et al. (1998) note that this 
facilitates separation of attitudes from behav- 
ior, relating an anecdote from a prior study 
in which a woman replied that her hus- 
band “did nothing” around the house, but 
when probed further with specific ques- 
tions about activities during the previous 
week, it became apparent that he was very 
much involved around the house, which she 
then acknowledged with surprise. The initial 
answer reflected her negative attitude toward 
her partner while the probes yielded a more 
factual account of his involvement in house- 
hold tasks. A drawback to this technique is 
that interviewers must probe consistently, 
otherwise some respondents will essentially 
receive different interviews than the rest of 
the respondents. 


6. Use a calendar, or possibly a life history 


calendar (LHC; Freedman, et al., 1988; 
Beckett, et al., 2001; Fowler, 1998; Rutter, 
et al., 1998; Scott and Alwin, 1998). Recall 
may be improved by providing respondents 
with a calendar on which major national 
and socio-cultural events are pre-printed. 
This may improve the internal consistency 
and sequencing of the respondent’s answers 
because they stimulate recall activities that 
help them place events in time, and because 
they help to generate boundaries for report- 
ing periods. Because LHCs are physical, 
visual aids, they can help respondents to 
relate visually and mentally to the timing 
of different events. LHCs can be bounded in 
any time unit needed — day, week, month, 
or year — and they may make use of a variety 
of substantive domains as defined by the 
need of the study, including geographical 
residence, marital and cohabitation statuses 
and transitions, fertility, school enrollment, 
and employment. Variables may be cate- 
gorical, ordinal, or interval. Interviewers 
must be trained in assisting the respon- 
dent with completing an LHC, which may 
constitute the entire interview, or may be 
integrated into a total interview format. If 
the LHC is integrated into a larger interview, 
Freedman, et al. (1988) note that the LHC 
would typically comprise the first part of the 
interview. If, however, one elects to inter- 
sperse calendar questions with non-calendar 
questions, this may potentially aid recall 
and time sequencing in the traditional inter- 
view section, but it may also confuse the 
respondent. A potential practical drawback 
of the calendar method is that as the com- 
plexity of the calendar increases, so does the 
complexity of coding responses, resulting in 
a potentially tedious (and expensive) coding 
process. Scott and Alwin (1998) took the life 
history calendar a step further, and argued 
that life histories should also assess such 
constructs as past levels of psychological 
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well-being. Thus, in order to represent this 
broader view of life history data, they sug- 
gest that the measurement should include 
variables that fall into three categories: (1) 
event histories (i.e., the collection of past 
events, including their timing, duration, and 
sequencing, and also present statuses, and 
future expectations); (2) the accumulation 
of experiences (i.e., where people are at 
a particular point in time, including the 
accumulation of experiences that result from 
people’s event histories and experiences 
resulting indirectly from life events, such as 
amount of schooling, and expectations for 
future experiences); and (3) the evaluation 
or interpretation of experiences (in either 
the present, the past, or as anticipated in the 
future). 

. Provide benchmarking (from Beckett, et al., 
2001; Fowler, 1998; Pierret, 2001). Respon- 
dent recall may be improved by reminding 
them of their situation at the time of the 
previous wave and bringing them forward 
from that point. Several large, national 
surveys, including the Panel Survey on 
Income Dynamics and the Health and Retire- 
ment Study, take advantage of computer 
technology to integrate benchmarking into 
subsequent interviews. Beckett, et al. (2001) 
note a number of caveats for this method: 
(1) if the question involves a concept about 
which more is known at the later time period 
than the earlier time period, benchmarking 
may confuse the respondent more than help 
them; (2) benchmarking requires financial 
resources, as the interviewer must have 
access to responses to earlier waves when 
conducting the interview; (3) benchmarking 
works best when it works forward in time 
from the previous wave, while retrospective 
studies often work backward from the 
current interview; and, most importantly, 
(4) if a respondent makes a reporting error 
at one time period, that error will be carried 
forward into future waves and may not 
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be corrected because the earlier reports 
become the “correct” reports. A potential 
drawback of this method is that it can create 
a “stability bias,” creating more consistency 
across time than was actually there. 

8. Vary recall period with saliency (Beckett, 
et al., 2001). Because more salient life events 
are better recalled, the recall period can be 
restricted for common, nonsalient events, 
whereas more salient events can be collected 
for longer recall periods. 

9. If the goal is to gain a broad understanding of 
the respondent’s life course, gather data on a 
range of potentially relevant experiences of 
a major kind using a mixture of more open 
questioning about important domains and 
systematic questioning of details (Rutter, 
et al., 1998). 


6 Conclusion 


Survey data on cognitive states can be collected 
in a design that is truly prospective, but data 
on experiences or behavior must necessarily be, 
at least to some extent, retrospective. All other 
things being equal, longer recall periods appear 
to generate less accurate results than shorter 
recall periods. To the extent that the use of 
longer recall periods in longer term retrospec- 
tive studies can be justified, the justification 
is based not on improved or equal accuracy, 
but on such considerations as cost, and the 
time between initial data collection and avail- 
ability of the data for analysis and dissemina- 
tion of the results. This does not mean that we 
should abandon longer term retrospective stud- 
ies because poor quality data are inevitable in 
such studies. As described here and by Mayer 
in Chapter 6 of this volume, it is possible in 
studies with longer recall periods to take mea- 
sures to improve respondent recall over longer 
spans of time. What is important is to recog- 
nize the inherent problems in recall for any 
length of time, the magnification of the prob- 
lem of accuracy when recall periods become 


120 Handbook of LongitudihdP Rega’ https:/afrilibrary.com 


longer, and the need to match appropriate tech- 
niques to enhance recall with the length of 
recall required in a specific study. For purposes 
of maximizing accuracy of recall in longitudinal 
survey research in general, however, it remains 
the case that minimizing the time between the 
events or behaviors of interest and the questions 
about those events or behaviors, when feasible, 
is preferable to the use of longer term retrospec- 
tive data. 


Glossary 


Cross-sectional retrospective research design 
This is research that uses a cross-sectional 
design, but by the use of retrospective recall 
methods, gathers longitudinal data. These data 
are designed to represent attitudes, behaviors, 
and events in the respondents’ lives across time, 
despite the fact they are collected at a single 
point in time. 


Heaping A phenomenon that occurs in mem- 
ory recall when respondents cannot recall a 
specific value, so instead they provided a “pro- 
totypical” response near the actual value. As a 
result, certain dates, ages, durations, or frequen- 
cies may be over-represented. 


Life history calendar (LHC) A life history cal- 
endar is an interview technique that assists 
respondents in supplying accurate information 
on events that occurred in the past. LHCs 
include references to national holidays and 
other known events, and may be bounded by 
the time unit most appropriate to the study 
(such as a week, a month, or a year). LHCs can 
help respondents to relate visually and men- 
tally to the timing of different events, which 
should improve their ability to recall events 
and to place those events in their proper time 
sequence. 


Panel study A longitudinal study in which a 
panel of individuals is interviewed at intervals 
over a period of time. 


Prevalence A type of data that reflects 
whether or not an event has happened, or 
whether or not a respondent has engaged in 
a behavior. It does not address how often the 
event or behavior may have occurred. 


Prospective reporting Prospective means 
looking to the future. In a prospective panel 
study, measurement occurs with each wave 
of the study, moving forward (even though 
the data collected are about the present or 
relatively recent past). 


Reliability Reliability is the extent to which 
a construct is consistently measured without 
random measurement error. 


Retrospective recall Thinking about, remem- 
bering, and reporting events that happened in 
the past. 


Telescoping A phenomenon that occurs in 
memory recall that refers to the allocation of 
events, characteristics, or behaviors to a more 
recent time period than the one in which it 
actually occurred. 


Validity This is concerned with assessing 
whether what actually is being measured is 
related to some external reality. That is, are we 
measuring what we think we are measuring? 
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| Chapter 8 J 


A review and summary of studies 
on panel conditioning 
David Cantor 


1 Introduction 


An important feature of a longitudinal sur- 
vey is the possibility of panel conditioning. 
This refers to when participation in earlier 
waves of the panel affects the responses in 
subsequent waves. If there are significant con- 
ditioning effects, the utility of a longitudinal 
survey to measure change is compromised. The 
purpose of this chapter is to provide a review 
of the literature on conditioning, while trying 
to tie it together around three general goals. 
One goal is to describe the different types of 
conditioning that have been observed. As will 
be noted below, conditioning has been identi- 
fied for almost all survey phenomena, including 
behavioral and attitudinal or opinion data. The 
review will provide examples of each of these. 
The second goal is to provide possible expla- 
nations for conditioning. This is important so 
that designers and analysts have a way to inter- 
pret and, possibly adjust, data when conducting 
analysis of panel data. The third goal is to pro- 
vide the reader with the size of observed effects. 
It is one thing to say that conditioning effects 
exist, but like any other type of measurement 
error, it is important to understand how large 


these effects might be when making judgements 
about design and interpretation. 


2 Theoretical and analytic 
principles 


One reason why there is not a great deal known 
about panel conditioning is that it poses a num- 
ber of analytic challenges. Unless one conducts 
a study specially designed to address condition- 
ing, it is difficult to unequivocally state when 
it is occurring and why. 


2.1 Theoretical distinctions 


In their extensive review of different explana- 
tions of conditioning, Waterton and Lievesley 
(1989) enumerate six reasons why conditioning 
might occur: 


1. Changing behavior or attitudes by raising 
consciousness. 

2. Freezing attitudes. 

3. More honest reporting of socially desirable 
behavior. 

4. Improved understanding of the interviewing 
rules. 
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5. Higher motivation. 
6. Lower motivation. 


These distinctions are theoretically useful. 
They provide a framework for designing and 
analyzing panel data that exhibit conditioning 
effects. If a prior interview changes the phe- 
nomenon being measured (e.g., 1 or 2 above), 
then the result is partly a function of the mea- 
surement process, rather than something occur- 
ring in the population of interest. Adjustment 
of later waves of a panel or collecting alter- 
native measures of change would need to be 
considered. If changes in the response process 
(points 3-6 above) is occurring, then it is crit- 
ical to understand which explanation holds. If 
conditioning leads to higher motivation, less 
social desirability bias or greater understanding 
of the response task, then later waves of a panel 
are best. On the other hand, if conditioning 
is related to less desirable respondent behav- 
ior, such as decreased motivation or avoidance 
of additional burden, then responses to later 
waves of a panel have higher error. 

Conditioning is a concern because it con- 
founds measurement error with real change. 
Partly for this reason, rotating panel designs 
(Kalton and Citro, 1993) are typically structured 
to be balanced. This means that panels contribut- 
ing to an estimate at a point in time are at the 
same degree of maturity. An assumption made 
when analyzing changes between periods for a 
rotating panel is that conditioning effects do not 
change over time and they are additive (e.g., 
rather than multiplicative). Ifthisis true, then the 
change estimates across periods are unbiased. 


2.2 Analysis requirements 


One reason that the measurement of condition- 
ing is not common is that it takes relatively 
elaborate designs to isolate the effects. Observa- 
tion of changes in measures over time by itself 
is not evidence of conditioning because it con- 
founds real change with conditioning. A com- 
mon analytic strategy to isolate a conditioning 


effect is to compare two groups which have 
been exposed to the survey a different num- 
ber of times, but have been interviewed during 
approximately the same calendar period. Dif- 
ferences between the groups should then the- 
oretically be due to conditioning. One way to 
make this type of comparison is with a rotating 
panel design, where at each wave of a panel, 
there is a new sample being interviewed for 
the first time. Similar, but less ideal, designs 
compare panels to other cross-sectional surveys 
that are conducted during the same time period. 
For example, Wilson and Howell (2005) com- 
pare the trend in measures in the prevalence of 
arthritis for a panel to trends from a continu- 
ing cross-sectional survey (the National Health 
Interview Survey). 

Nonresponse is a concern when assessing 
possible conditioning effects. Inevitably, data 
used from later panel waves will be subject 
to higher levels of nonresponse because of the 
cumulative effects of attrition. Studies that esti- 
mate conditioning effects typically implement 
a nonresponse adjustment. Some studies also 
try to simulate changes that assess whether 
those that drop out, given continuance of “no 
change” for the nonrespondents over waves, 
would affect conclusions about conditioning. 

To rigorously assess the reasons behind con- 
ditioning, it would also be necessary to control 
for interviewer behavior and changes in inter- 
viewers between waves (Van Der Zouwen and 
Van Tilburg, 2001; O’Muircheartaigh, 1989). 
O’Muircheartaigh (1989) argues that to pinpoint 
the sources of conditioning one would need an 
experimental design that was specifically set up 
to tease out all of these effects (interviewers; 
respondents; interviewer—respondent interac- 
tions). Needless to say, this type of design has 
not been implemented. Nonetheless, it is still 
useful to review evidence of conditioning and 
assess how it fits within the processes Waterton 
and Lievesley (1989) enumerated. This should 
provide some perspective on the types of effects 
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that have been found and how they might fit 
within the above theoretical framework. 


3 Conditioning and changes 
in behavior 


Several studies have found evidence of con- 
ditioning that leads to changes in behavior. 
The clearest example of this are studies of 
the electorate and voting behavior (Kraut and 
McConahay, 1973; Yalch, 1976; Traugott and 
Katosh, 1979), which have found that participa- 
tion in a pre-election survey motivates respon- 
dents to subsequently vote. Clausen (1968) 
reports on a study of persons who participated 
in a pre-election survey in 1964. Voting infor- 
mation for those that participated in the pre- 
election survey were retrieved and compared to 
the proportion that voted in the general pop- 
ulation. The analysis compares estimates from 
respondents on a panel survey who were inter- 
viewed prior to and after the 1964 election to 
estimates from one-time surveys that only inter- 
viewed after the election. A series of adjust- 
ments are made to the panel survey to make 
the population comparable to the other surveys, 
as well as trying to account for differences in 
nonresponse patterns. After adjustments, there 
remained a significant difference between the 
post-election estimates from the panel design 
and the cross-sectional survey. Up to 7 percent- 
age points were attributed to the pre-election 
interview stimulating turnout for the actual 
election. Traugott and Katosh (1979) replicated 
these results and found a larger stimulus effect. 

Wilson and Howell (2005) compare the preva- 
lence of reports of arthritis in later panels of the 
Health and Retirement Survey (HRS) to cross- 
sectional estimates from the National Health 
Interview Survey (NHIS). The HRS rate of 371 
per thousand in 1992 goes up to 415 per thou- 
sand in 1996, which is statistically significant. 
The NHIS rate for this time period stays vir- 
tually flat (296 to 304, not statistically signifi- 
cant). There are important differences between 


the two surveys. The questions used in the 
surveys are not identical and the surveys pro- 
vide different contexts and ordering of sur- 
vey topics. In addition, the HRS has a lower 
response rate at both the initial contact and 
from attrition over the panel. These differences 
are partly reflected by a prevalence rate for 
arthritis on the HRS that is 25% higher than 
on the NHIS. However, even after the authors 
attempt to adjust for differences in response 
rates, the divergent trends remain. In further 
support for the conditioning hypothesis, it is 
shown that a supplemental HRS sample intro- 
duced in 1998 exhibits similar prevalence rates 
as the first wave in 1992. As with the 1992 
panel, the trend for the 1998 panel exhibits 
an upward trend in later panel interviews. 

The authors discuss several possible explana- 
tions for the differences in trends between the 
NHIS and the HRS. The one they seem to think 
is best supported by the evidence is that the 
early interviews make respondents more aware 
of the possibility that they may have arthritis. 
This results in proactive follow-up with their 
doctors and even perhaps more actively asking 
about the possibility they may have arthritis. 
This argument is made through a process of elim- 
ination. By trying to control for differences in 
design and nonresponse, the most logical expla- 
nation remaining is the conditioning hypothesis. 
Unfortunately, the authors do not test this 
hypothesis for other diseases or other questions 
that may be common across the two surveys. 

Veroff et al. (1992) follow up work by Wilson 
et al. (1984) suggesting that asking respondents 
about their feelings about their marriage will 
have an effect on the quality of the marriage. 
This study arose out of a concern from their 
Human Subjects Review Board that intensive 
probing of respondents about their marital rela- 
tionship may raise concerns that may not have 
been brought up if the interview never took 
place. They test this by running two paral- 
lel panels. One panel consists of an intensive 
set of interviews with both the husband and 
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wife about their relationship over a four-year 
period. Four relatively intense interviews were 
completed. The control group had fewer, and 
shorter, interviews over the four-year period. 
They found that the panel receiving more inten- 
sive interviewing had greater variance in their 
marital attitudes after the second interview. 
After the fourth year, they also found that for 
certain race—gender groups, the experimental 
group had more positive attitudes about their 
marriage, as measured by different scales on 
marital stability. For example, Blacks seemed to 
exhibit a much larger conditioning effect than 
Whites. While this study found some evidence 
of conditioning, the results were not consis- 
tent across groups. In addition, the study does 
not have an external measure of the quality of 
the marriage. They have only what respondents 
reported on the items within the survey. 

Other evidence of behavior change is found 
by Battaglia et al. (1996). These authors hypoth- 
esized that asking mothers about the immu- 
nization status of their children will lead to 
respondents getting their babies vaccinated 
after the interview. The study was not a panel, 
but one which collected immunization records 
from respondents. This allowed the researchers 
to examine whether respondents vaccinated 
their children after the interview. They found 
that among those that reported their children 
as not having up-to-date vaccinations, 9.2% got 
at least one vaccination within 90 days after 
the interview. They conclude that for a panel 
survey, this would introduce an upward bias 
in prevalence estimates of approximately 2 per- 
centage points (60% vs. 62%). As noted by the 
authors, this is an overestimate of the effect of 
conditioning, since it does not account for the 
natural growth of the percent of children that 
are vaccinated as they get older. 

One theme that runs through many of the 
studies on conditioning is that respondents who 
are the least committed or certain about the out- 
come of interest will be the most subject to a 
conditioning effect. With respect to changing 


behaviors, the interview serves as a stimulus 
to take some action. For example, for voting 
it seems to serve to significantly raise respon- 
dent’s propensity to vote. Consistent with this 
idea, Clausen finds that there seemed to be a 
bigger effect on turnout among Whites than for 
Blacks. This is attributed to the greater initial 
interest by Blacks in the election. For med- 
ical information, such as arthritis or immu- 
nizations, the interview may make respondents 
more aware of something that they feel they 
should be aware of. If this is true, then those 
most aware of the condition and/or the most 
able to act will be the least likely to change their 
behavior. Battaglia et al. (1996) found some evi- 
dence that those in lower income groups and 
with the least education were the most likely 
to get vaccinations after the interview. One 
might expect this assuming that these popula- 
tion groups are the least aware or least likely to 
get a vaccination. 

One would expect that behaviors that are dif- 
ficult or expensive to engage in will be least 
likely to be subject to this type of condition- 
ing effect. For example, one would not expect 
making major consumer purchases to be influ- 
enced by an interview. Similarly, events that 
are not directly under the control of the respon- 
dent, such as victimization or being hospital- 
ized, should not be influenced by an interview. 
Consequently, observed conditioning effects for 
these types of phenomena are likely to be the 
result of changes in the response process, rather 
than actual changes in behavior. In the next 
section we review evidence of conditioning for 
these types of surveys. 


4 Conditioning and changes in the 
process for reporting behaviors 


Some of the earliest evidence of conditioning 
came from consumer panels that asked about 
purchases (Prais, 1958; Ferber, 1953; Ehrenberg, 
1960). Neter and Waksberg (1964a) report one 
of the first studies using a rotating panel. They 
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compare the second and third interviews of the 
panel.' They find a significant drop of 9% in 
the reporting of household maintenance and 
repairs between the second and third interviews 
(173 million to 157 million). The drop was more 
pronounced for jobs of less than $20 value and 
for designated respondents such as the wife or 
any knowledgeable respondent. When the head 
of the household or a combination of the hus- 
band and the wife was selected, there was not 
a significant drop. 

There are several other studies that have 
looked at consumer purchases. Silberstein and 
Jacobs (1989) report on the mean expenditures 
from the Consumer Expenditure Survey, an 
ongoing survey in the US that collects data for 
input into the Consumer Price Index. From Neter 
and Waksberg above, one might expect there 
would be a tendency for smaller purchases to 
be reported less often at later interviews. This 
should increase the mean expenditures at later 
times in sample. When looking at the mean 
expenditures for all types of purchases, Silber- 
stein and Jacobs did not find any significant pat- 
terns. Significant changes were observed when 
looking at 7 of 17 more detailed expense classes. 
They find some evidence of decreased mean 
expenditures between the second and fifth inter- 
views. However, this pattern was not consistent. 
For some expenditures, there was an increase 
rather than decrease. In addition, many ofthe sta- 
tistically significant findings were not substan- 
tively very large. 

Pennell and Lepkowski (1992) analyze 
reports of income from the Survey of Income 
and Program Participation (SIPP), a longitudi- 
nal survey. By comparing panels that overlap 
in time, they are able to compare data at a 
comparable calendar period from samples inter- 
viewed a different number of times. They exam- 
ine income recipiency, income amounts, health 


‘They do not use the first interview because it differs 
on key design features (e.g., length of recall period). 


insurance coverage, and labor force participa- 
tion. They do not find evidence of a consistent 
effect of conditioning. The number of statis- 
tically significant differences in their analysis 
was less than what one would expect by chance 
(see also McCormick et al., 1992, for additional 
analyses of SIPP). 

Frick et al. (2004) find the opposite in analysis 
of changes in the Gini coefficient measur- 
ing income inequality for the German Socio- 
Economic Panel (SOEP). Comparing the trends 
in a rotating panel design, it is found that the 
first interview of a newly introduced panel indi- 
cates higher income inequality than data from 
an older panel which is on later waves. Once 
respondents in the new panel are interviewed 
several times, however the measures converge 
with those provided by respondents to the longer 
running sample. After ruling out the possibility 
that this was due to either missing data or panel 
attrition, Frick et al. conclude that the difference 
was due to respondents getting better at the 
response task after the first several interviews. 

The conclusion that accuracy of income 
reports increases over panel waves is supported 
by Rendtel et al. (2004) who analyze the Euro- 
pean Community Household Panel (ECHP). 
Respondents from three countries participating 
in this panel (Denmark, Finland, and Sweden) 
have available income information from regis- 
ters. By comparing the survey data to the regis- 
ter data, the analysis is able to assess whether 
changes at later panels also change in accuracy. 
Confirming the hypothesis by Frick et al. (2004), 
they do find accuracy to increase slightly over 
the first five waves of the survey. They also find 
indirect indicators of quality (e.g., missing data; 
use of estimation, rather than precise methods) 
to increase with panel waves. 

Other studies examining reports of behaviors 
have looked at reports of medical conditions. 
This topic is similar to consumer behav- 
ior because it is relatively well defined and 
behaviorally-based. Corder and Horvitz (1989) 
compare quarterly estimates from a _ panel 
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survey to estimates from a cross-sectional sur- 
vey conducted during the same period. They 
examine a wide range of phenomena, including 
hospital discharges, first hospital visits, physi- 
cians visits, and reporting of acute conditions. 
They find no evidence of panel conditioning by 
comparing the trend in the panel data to the 
trend in the cross-sectional data. 

Cohen and Burt (1985) find that collecting 
data more frequently on a panel led to sig- 
nificantly fewer medical events and medical 
expenditures being reported. This study com- 
pared these measures for a group of people 
interviewed five times to a group interviewed 
4 times during the same 1-year period. The 
magnitude of the differences ranged from 7% 
to 16%. The authors argue that the data col- 
lection frequency led to reductions in report- 
ing. This argument was supported by evidence 
that respondents reported more accurately in 
the four round group, as indicated by a greater 
concordance with provider records collected 
during a follow-up of providers for whom the 
respondent reported seeing. 

While both expenditures and visits to the 
doctor are well-defined events, there are other 
behavioral phenomena that have more ambigu- 
ity associated with them. One might argue that 
conditioning may be greater for these types of 
events because respondents are more likely to 
use the first interview to learn about the objec- 
tives of the survey than may be the case for more 
straightforward types of behaviors. Respon- 
dents may become better prepared to answer 
questions, once they have been exposed to the 
entire questionnaire (Waterton and Lievesley, 
1989; Cannell et al., 1981; Biderman and 
Cantor, 1984). An alternative argument is that 
respondents may figure out ways to reduce bur- 
den by saying they do not have the condition 
or were not engaged in the behavior, e.g., when 
an affirmative response to questions about the 
behavior leads to follow-up questions asking for 
details about the behavior. 


One example of this is for reports of victim- 
ization. These events are subject to idiosyn- 
cratic definitions that are influenced by the cues 
provided to respondents (Cantor and Lynch, 
2000). The National Crime Survey (NCS) has a 
rotating panel design and exhibits significant 
decreases in reports of victimization at later 
panel waves. The largest drop is between the 
first and second interview (30%-—40%). This 
is followed by drops of around 15% in later 
waves of the panel (Cantor, 1989). The large 
drop between the first and second interview is 
explained by the fact that the reference period 
for the second interview is temporally bounded 
by the first interview. The first interview does 
not have this temporal bound. Bounding is 
thought to minimize telescoping events into the 
reference period (Neter and Waksberg, 1964b). 

A related phenomenon is self-reported per- 
petration of criminal or delinquent behavior. 
Thornberry (1989) compared time trends for the 
National Youth Survey (NYS) to the Monitoring 
the Future (MTF). The NYS is a panel survey 
and MTF is a repeated cross-sectional survey. 
Significant differences in the trend of these data 
were found. The author concludes that this is 
the result of conditioning on the NYS. Menard 
and Elliott (1993) re-analyze the same data and 
argue that the comparisons made by Thornberry 
do not appropriately consider the differences 
in methodology (e.g., question wording, sam- 
ple design). For example, the cross-sectional 
portion of the MTF is of high school seniors, 
whereas the NYS is a sample of youth between 
11-24 (depending on the cohort). In this re- 
analysis, no significant evidence of condition- 
ing for the NYS was found. 

A third, relatively ambiguous set of behav- 
iors is unemployment. The definition of unem- 
ployment depends on idiosyncratic definitions 
of labor force participation. Behavior related 
to “looking for work” might be difficult to 
define uniformly across respondents. Signifi- 
cant drops in measures of unemployment have 
been found in several different labor force 
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surveys (Bailar, 1989; Ghangurde, 1982). Bailar 
compares different labor force statistics across 
the time in samples for the Current Population 
Survey (CPS). Rates of unemployment at the 
first interview are found to be 7% higher than 
the average over the life of the panel. There are 
also small, statistically insignificant, drops in 
the rate over the next two interviews. 

For almost all of the above studies, sig- 
nificant effects take the form of a reduction 
in the reporting of the particular phenomena 
(e.g., expenditures, victimization, unemploy- 
ment). A “burden” explanation for the drop is 
that respondents are avoiding the extra ques- 
tions associated with reporting the behavior (e.g., 
unemployment). A second possibility is that 
respondents are better prepared or more moti- 
vated to answer questions after the initial inter- 
view. There isn’t any direct evidence to address 
either explanation. Re-interview data from the 
CPS (Bailar, 1975) suggests that estimates of 
unemployment generally are too low. One might 
therefore assume that higher rates of unemploy- 
ment represent more accurate information. It is 
the case, however, that there is substantial error 
associated with the re-interview data as well. 
Even if one accepts the re-interview as a less 
biased estimate, we are not aware of any data that 
breaks it out by time in sample. Furthermore, itis 
not entirely clear how respondents are avoiding 
burden when reporting activities related to being 
in or out of the labor force. 

With respect to unemployment, a second pos- 
sibility might be that the first interview is not 
bounded by any other interviews. Determina- 
tion of unemployment status relies on a refer- 
ence period of four weeks. It may be that some 
of the pattern is due to respondents telescoping 
from outside the four-week period. For exam- 
ple, it may be that the subject did look for a job 
three months ago, but the respondent is mis- 
dating it at the first interview. This may not 
occur for all of the other times in sample which 
have some type of bound by a previous inter- 
view. In addition, respondents may not quite 


understand the precision required by the time 
reference at the first interview. They may be 
relatively eager to respond in a positive way 
and may tend to include behavior from outside 
the reference period. Once completing the first 
interview, they become more knowledgeable 
about these information demands and are less 
likely to report labor force activity outside the 
reference period. 


5 Conditioning and reports of 
attitudes, opinions and subjective 
phenomena 


Attitudes, opinions, and subjective phenom- 
ena are less well defined than the behav- 
ioral phenomena above. Van der Zouwen and 
van Tilburg (2001) discuss panel condition- 
ing of these types of reports in the context of 
trying to measure a latent characteristic. They 
make the distinction between interviews chang- 
ing respondent attitudes or changing the way 
the questionnaire items measure the attitude. 
The former is equivalent to changing the latent 
trait that is being measured, similar to changing 
behavior as discussed in Section 3. The latter 
relates to changes in the measurement of the 
latent trait. While the distinction is important, 
no evidence has been generated that allows sep- 
arating the two. Consequently, for purposes of 
discussion, we combine both in this section. 
For behavioral questions such as victimiza- 
tion, reporting an event often leads to getting 
asked additional questions. For attitudes and 
opinions, no such branching is typically used. 
Perhaps for this reason, a number of researchers 
that find conditioning effects argue that data 
are improved because of a learning or moti- 
vation effect over panel waves. The empirical 
evidence of a panel effect for attitudes, opin- 
ions, and subjective reports is mixed. In some 
cases, there seem to be effects of conditioning 
on the expression of certain attitudes, while for 
other topics there are no effects found. Bridges 
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et al. (1977) hypothesize that panel condition- 
ing will occur for those topics that respon- 
dents perceive as being important, but are not 
particularly informed about. If respondents are 
not informed about a topic covered at an ini- 
tial interview and they view it as important, 
they will form opinions that are expressed at 
the follow-up interviews. For attitudes that are 
already formed or refer to unimportant topics, 
no changes will occur. 

To partially test this, they conducted an 
experiment looking at two different attitudes. 
At the first interview, one half of the sample 
was interviewed about their concern about can- 
cer. The other half was interviewed on their 
concern about burglary prevention. Both groups 
were asked two general questions about their 
concern about good health and about crime. 
At the second interview, both groups were 
given identical questionnaires with items on 
both cancer and burglary. The results show 
that those interviewed about cancer in the 
first interview exhibited an increased concern 
about good health at the second interview. 
A similar pattern did not result for those ini- 
tially interviewed about burglary. Those inter- 
viewed about burglary at the first interview did 
not show an increase in concern about crime 
in the second interview. The change in atti- 
tudes about general health ranged from 8% to 
11%. They also find that a mailing about bur- 
glary prevention that was in between the two 
interviews increased the change in responses 
between interviews. Those that received the 
mailing were more likely to report concern for 
both health and crime at the second interview. 
This effect was approximately of the same mag- 
nitude of prior interviews (around 10%). 

Bridges et al. account for this pattern by not- 
ing that respondents had more specific con- 
cerns about burglary than about cancer. Less 
concern is indicative of how well formed the 
attitude is in the respondent’s mind. The less 
formed, the more likely the respondent will 
be to change their answer. The authors offer 


evidence of this by noting that at the initial 
interview, respondents showed higher concern 
about burglary than cancer. With respect to the 
mailing, the authors interpret this as evidence 
that providing more information about safety 
issues raised concerns on not only crime, but 
also general health issues. 

A related explanation has been given to 
observed panel effects for life satisfaction. 
Analysis of the German Socio-Economic Panel 
(SOEP) finds that these measures decrease over 
time (Landua, 1991; Frick et al., 2004; Jiirges, 
2005). From data contained in Frick et al. 
(2004), approximately 20% more respondents 
select the top three scale points to the satisfac- 
tion question when interviewed the first time 
relative to those who had been asked in prior 
interviews (Table F-1 for 2002, balanced cross- 
section). Graphs in Jiirges (2005) show a drop 
from 40% of respondents choosing the most sat- 
isfied category in 1984 to less than 5% 14 years 
later. This compares to a much smaller decrease 
for a repeated cross-sectional survey conducted 
during this same time period (about 20% to 
13%). Frick et al. (2004) argue that the drop 
over the panel is due to respondents becoming 
more familiar with their satisfaction by thinking 
about the question prior to the second, third, 
etc... interviews (see also, Jiirges, 2005). 

This interpretation of a “learning effect” of 
repeated interviewing is viewed as reducing 
overall measurement error. However, no empir- 
ical data is provided which identifies this 
explanation over others that might attribute the 
drop as increasing error (e.g., due to changes in 
interviewers; increases in social desirability). 

Social desirability has also been hypothe- 
sized as a cause of panel conditioning. Pevalin 
(2000) cites several studies that observed a 
decrease in mental health scores as measured 
by the General Health Questionnaires (GHQ). 
These studies conducted interviews over both 
a relatively short (e.g., multiple interviews 
within a one-year period) and long (once a 
year for multiple years) periods of time with 


Presented by: https: /ariliRrayecomnmary of studies on panel conditioning 131 


both specialized and general population sam- 
ples. None of these studies had a parallel group 
that controlled for trends over time or possible 
effects of aging. They also used different ver- 
sions of the GHQ (12 item vs. 30 item vs. 60 
item). A few of these studies did conduct more 
detailed diagnostic interviews as part of the pro- 
tocol. These few studies found evidence that 
the GHQ lost sensitivity over time. The drop in 
the GHQ over time was attributed to an increas- 
ing desire to appear to the interviewer that they 
are mentally healthy. 

Pevalin’s own analysis of the British House- 
hold Panel Study (BHPS) compared the GHQ 12 
item version to an annual cross-sectional sur- 
vey done in Britain (Health Surveys for Eng- 
land — HSE). No evidence of conditioning was 
found. Unlike prior studies cited in the article, 
scores increased, rather than decreased, over 
time. The author hypothesizes that the differ- 
ence from other studies could be the relatively 
long time interval between survey administra- 
tions for the HSE and BHPS. The large re-test 
effects found in other studies tended to occur 
when the administration of the survey was con- 
siderably shorter, with it being conducted mul- 
tiple times in a year. 

Sobol (1959) examines attitudes related to 
perceptions of individual economic well-being 
(e.g., better off than last year? Expect to be bet- 
ter next year?). The analysis initially finds a 
significant drop in these attitudes relative to 
a cross-sectional survey. However, this analy- 
sis finds that once controlling for attrition in 
the panel, the differences disappear. Those in 
lower income groups, renters and those not 
interested in the survey tended to drop out after 
the first interview. Once examining only those 
that stayed in the panel for the entire period, 
no trend in the measures was found. 

Waterton and Lievesley (1989) test several 
different hypotheses on the causes of condi- 
tioning with the British Social Attitudes Panel. 
This was a small panel (approximately 800 at 
the first wave) that originated with the Social 


Attitudes Survey. By comparing measures from 
the panel to cross-sectional measures for a 
series of three different years, they test whether 
repeated interviewing changes attitudes by 
examining changes in political attitudes over 
the three waves of the study. They find changes 
do occur in the predicted direction. There are 
increases in political partisanship of approxi- 
mately 7 percentage points over a 3-year period 
(51% to 58%). Similarly, they test whether 
conditioning occurs to reduce the social desir- 
ability bias over time by examining six dif- 
ferent items on racial prejudice, social action 
(protesting an unjust law; break law to fol- 
low conscience), refusing an income question, 
propensity to break the law, and newspaper 
readership. They find ambiguous evidence for 
this, with three of the items being in the right 
direction, but only one of these being statis- 
tically significant. It should be noted, how- 
ever, that the panel had relatively small sample 
sizes to detect rather low prevalence rates (e.g., 
4.9% of the sample who admit to being racially 
prejudiced). 

They also test for social desirability a sec- 
ond way by examining the inter-item correla- 
tions among related items. They hypothesize 
that the inter-item correlations should go down 
over time because respondents are not attempt- 
ing to stay consistent within a questionnaire. 
They do not find any evidence supporting this 
hypothesis for any of the items tested. 

Seemingly, large panel effects were found in 
a study related to a panel survey on young 
people’s (age 16—24) intention to enlist in the 
military (Nieva et al., 1996). The Youth Atti- 
tude Tracking Survey (YATS) was an ongoing 
general population telephone survey that had 
a rotating panel design. The panel was set up 
to save on the costly process of screening for 
youth who were eligible for the interview. Com- 
parison of the first interviews to those inter- 
viewed a second and third time found drops 
in the respondent’s desire to enlist by as much 
as 33% (e.g., 20.7% to 14.7%). This study had 
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significant panel attrition, as no allowance was 
made to track those who moved between panel 
waves. Approximately 40%—50% of the initial 
sample was not interviewed a second time. For 
this reason, the above results are confounded by 
nonresponse bias. Those who were most likely 
to drop out tended to be the most positive on the 
enlistment propensity measure. Nonetheless, 
using various nonresponse adjustment strate- 
gies, the significant drop in propensity was not 
dramatically affected. 

The conflicting results across these studies 
points to the need to develop more detailed 
explanations for the occurrence of condition- 
ing effects. Up to this point in time, there has 
been little in the way of validating the com- 
peting explanations discussed by Waterton and 
Lievesley (1989). Consequently, it is difficult to 
predict when conditioning effects might occur, 
either with respect to content area or survey 
design. In the summary below, we describe the 
different frameworks that might be used in try- 
ing to disentangle conditioning effects. 


6 Summary and discussion 


The results of selected studies of panel con- 
ditioning are summarized in Table 8.1. Stud- 
ies on panel conditioning have found effects 
that should concern analysts of longitudinal 
data. Generally, the size of these effects range 
from 5% to 15%, with a few finding greater 
effects. There are a number of studies that have 
not found significant panel effects. There are 
also instances where the same phenomena are 
studied with different results (e.g., consumer 
purchases). The mixed results may be partly 
due to the complicated analytic requirements 
to identify the effects and to be able to pin- 
point the causes. In order to identify condi- 
tioning effects, it is necessary to separate real 
change due to trends in the phenomena of 
interest, effects of panel attrition, and changes 
that occur across panels (e.g., interviewers and 
interviewer—respondent interactions). 


We have organized the above discussion 
around a continuum with one end being behav- 
iors that may be changed as a result of an inter- 
view, the middle being reports of behaviors that 
might be changed with no real change in behav- 
ior, and the other end being changing latent 
traits, such as attitudes, opinions, and subjec- 
tive assessments. This continuum reflects the 
different content areas affected, as well as the 
way one might measure conditioning effects. 

Several behaviors have been found to be 
affected by conditioning. In the context of 
voting, a pre-election interview increases the 
respondent’s sense of civic duty, which directly 
affects their voting propensity. It may also 
increase the perceived obligation to vote. With 
respect to medical conditions, the interview 
puts to the forefront issues that are of important 
personal concern to the respondent. For exam- 
ple, the review above cited the case of asking 
about arthritis where it was hypothesized that 
conditioning may have motivated respondents 
to see a doctor and get it diagnosed (e.g., when 
someone has been experiencing pain, but had 
no explanation). A similar motivation was sug- 
gested when asking about vaccinations. 

The evidence and analyses are not definitive. 
It is difficult to predict when conditioning 
will change behaviors. Asking about medical 
conditions, for example, does not have the 
same effect for all respondents and all medi- 
cal issues. For example, as noted above, Cohen 
and Burt (1985) report decreases, rather than 
increases, in reports of health care utilization 
and expenditures with greater interviewing fre- 
quency. Similarly, Corder and Horvitz (1989) 
do not report any indications of conditioning 
for reporting medical expenditures. There is 
no apparent change for consumer expenditures. 
What may distinguish the studies that involve 
changing behavior is that they involve behav- 
iors that are relatively easy to engage in (e.g., 
voting). In addition, changes may only occur 
for populations that are particularly close or 
affected by the topic. For example, the study by 


Table 8.1 Summary of selected studies testing for panel conditioning by type of measure 


Variable Analysis Effect size* 
Change Behavior 
Clausen, 1968 Voting Compared to cross-section 8% 
Voting records 
Traugott and Katosh, 1979 Voting Compared to cross-section 9% 
Wilson and Howell, 2005 Arthritis Compared to cross-section 12% 


Veroff, et al., 1992 


Battaglia, 1996 


Reporting Behaviors 
Neter and Waksberg, 1964a 


Silberstein and Jacobs, 1989 
Pennell and Lepkowski, 1992 
Frick, et al., 2004 


Corder and Horvitz, 1989 


Cohen and Burt, 1985 


Cantor, 1989 
Menard and Elliott 1993 
Bailar, 1989 


Marital stability 


Immunization 


Expenditures on household 
maintainence 

Consumer expenditures 

Income 

Gini coefficient for reports 
of income 

Hospital discharges 

Hospital visits 

Physicians’ visits 

Acute conditions 

Medical events 

Medical expenditures 

Victimization 

Criminal offending 

Unemployment 


Reports of attitudes, opinions and subjective phenomena 


Bridges, et al., 1977 
Frick, et al., 2004 
Pevalin, 2000 
Sobol, 1959 


Waterton and Lievesley, 1989 


Nieva, et al., 1996 


*Computed as [(C-P)/C] x 100 where C= cross-sectional estimate, P = panel estimate; ns = not statistically significant. 


Concern with health 
Concern about crime 
Life satisfaction 
Mental health 
Perceptions of economic 
well-being 
Political attitudes 
Racial predjudice 
Social action 
Propensity to 
break law 
Propensity to join military 


Compared to group with less intense 
interviews 
Checked records after interview 


Rotating panel design 
Rotating panel design 
Rotating panel design 
Rotating panel design 


Compared to cross-section 

Compared 5 rounds of interviews to 4 rounds 
of interviews 

Rotating panel design 

Compared to cross-section 

Rotating panel design 


Experimented with inclusion of question in 
one group 

Compared to cross-section 

Compared to cross-section 


Compared to cross-section 
Compared to cross-section 


Rotating panel design 


14% to 100% 
Blacks only 
3% 


9% 

ns 

ns 

can’t compute 


ns 
7%-16% 


15% 
ns 
7% 


8%-11% for health 


ns for crime 
20% 
ns 


ns 


13% for political 
attitudes ns for other 


attitudes 


33% 
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Wilson and Howell (2005) studied adults aged 
55-56 years old. This is an age when people 
may be just getting concerned with arthritis. 
For vaccinations, new mothers may be the most 
susceptible to a conditioning effect. The fact 
that Battaglia et al. (1996) found this effect 
was slightly greater for the most economically 
marginalized is consistent with this idea. 
Explaining why reports of behaviors are 
affected by conditioning is more complicated. 
When effects are observed, the pattern is for 
quantities or rates of behaviors to decrease over 
the life of the panel. This pattern is not observed 
for all those discussed above. In the case of 
reporting income, consumer expenditures, and 
medical events, there were studies that found 
both negative and positive evidence of condi- 
tioning. This ambiguity may be related to fac- 
tors that are not easily controlled. For example, 
the spacing between interviews may have sig- 
nificant effects on conditioning. Shorter time 
periods between interviews should increase 
effects because it may increase the tendency for 
respondents to remember what happened dur- 
ing the previous interview (e.g., Kalton et al., 
1989). Interviewer effects are uncontrolled in all 
of the studies reviewed above, except for one. 
Two explanations have been offered to explain 
why conditioning may affect reports of behav- 
iors. One is that respondents learn that affirma- 
tive answers lead to additional questions. They 
then avoid answering affirmatively in the future. 
A common assumption is that respondents may 
actively avoid questions if they understand the 
implications of answering in a particular way. A 
common model that summarizes this hypothe- 
sis is that respondents tend to satisfice (Krosnick, 
1991) more in later panels of a survey. Once 
the uniqueness of the survey task wears off, 
respondents view the task more routinely and 
may not exert as much effort at hard tasks, 
such as recall over extended periods of time. 
There is very little data to assess the bur- 
den explanation. We could not find studies that 
show that respondents avoid certain responses 


because they believe it will increase the bur- 
den on the survey. One example that we 
could find is a study by Turner (1984), who 
reports that fewer crime incidents are reported 
when respondents are asked detailed questions 
right after they answer affirmatively to each 
screening question. Even in this case, however, 
it isn’t clear if respondents are reacting to the 
burden or are using the detailed questioning 
to get a better idea of the type of phenomena 
of interest to the study. If burden does play 
a role in responding, it isn’t clear the condi- 
tions under which it occurs. It isn’t clear how it 
applies across panel waves, for example. Sim- 
ilarly, how do perceptions of burden relate to 
short versus long interviews? One might expect, 
for example, the effect to be a function of the 
overall burden and sensitivity of the survey. 
Long surveys may tempt respondents more to 
avoid burden than relatively short surveys. 

A view that burden is a key driver differs 
sharply from the view that respondents are gen- 
erally motivated to respond, learn about the 
survey at the first administration and use this 
knowledge when answering questions at subse- 
quent waves. Under a learning hypothesis, the 
respondent is seen as initially being very anx- 
ious to provide responses that might be of use 
to the study. It is partly through this motiva- 
tion that they may actually over-report when 
initially coming up with a response. After they 
complete the survey the first or second time, 
their understanding and response formulation 
are adjusted to reflect this additional thinking 
(e.g., Biderman et al., 1986; Cannell et al., 1981). 

One way to distinguish between these two 
hypotheses would be to examine measures of 
data quality over the life of a panel survey. 
The burden hypothesis would predict that data 
quality decreases, while the learning/motivation 
hypothesis predicts that it improves. While 
theoretically this question could be addressed 
with standard survey methods, there are very 
few studies that have actually done so. Bailar 
(1989) reports on several studies that seem 
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to show a decrease in the quality of report- 
ing expenditures (Pearl, 1979) and illnesses 
(Mooney, 1962). As noted above, Cohen and Burt 
(1985) also show a decrease in the correlation 
between reports and records for provider vis- 
its. Contrary evidence is reported for the report- 
ing of income by Rendtel et al. (2004) who find 
improved data quality for reports of income 
when checking against records. 

One might also expect variation in panel 
effects across different types of behavioral phe- 
nomena. Under a learning/motivation model, 
behaviors that are relatively well defined may 
not greatly benefit from a prior interview. Thus 
phenomena like hospitalizations and income 
may not exhibit significant conditioning effects. 
On the other hand, for phenomena that are more 
difficult to define, there would be larger con- 
ditioning effects because the survey provides a 
great deal of information that the respondent 
can use the next time. The evidence on this is 
decidedly mixed across studies, as evidenced 
by conflicting effects for the same phenomena 
(e.g., consumer expenditures). 

This suggests that the reasons for condition- 
ing are more complicated than the above expla- 
nations would imply. Recent cognitive and 
social psychological theories of the response 
process do view respondents more along a 
continuum with respect to their approaches 
to the response task (Tourangeau et al., 2000, 
pp. 1-22). Some respondents will be anxious to 
exert effort to do whatever it takes to complete 
the task. This would include trying very hard 
to understand the meaning of the questions, 
recalling information from memory (or taking 
actual or mental notes between panel waves) 
and fitting their responses within the structure 
of the survey. Others will exert as little effort 
as possible, while many will be somewhere in 
the middle. It is likely that survey character- 
istics (e.g., difficulty of the task), the interest 
respondents have in the topics, and the respon- 
dent’s cognitive abilities, among other things, 


would influence the extent that the burden or 
learning/motivation hypotheses might be true. 

Similarly, the influence interviewers might 
have on how well respondents carry out the 
task may be very important. Interviewers may 
change between waves and collect the data in 
different ways (e.g., how they probe), which 
may lead to interviewer effects. Conversely, the 
same interviewer may probe in ways that lead to 
a systematic change over time. For example, van 
der Zouwen and van Tilberg (2001) found from 
recording of interviews a tendency for some 
interviewers to use data from the prior inter- 
view to shape the answers at subsequent waves. 
They found this led to a decrease in the number 
of reports of interest (network size). In addi- 
tion, they argue that the use of these data by the 
interviewer are shaped by a desire to keep the 
interview as short as possible. Rather than an 
effect that is due to respondent’s avoiding bur- 
den, these data indicate that it is the interviewer 
who is shaping the responses, through use of 
data from the previous interview, to reduce the 
number of reports. 

The length of a survey is not clearly linked to 
how one answers attitudinal questions. There- 
fore when moving to explaining conditioning 
for latent characteristics, such as attitudes, bur- 
den is not clearly relevant. Perhaps for this 
reason, the causal effects related to condition- 
ing of attitudinal data is not as much a subject 
for debate in the literature. Most of those who 
find conditioning effects on attitudes or opin- 
ions attribute changes to a learning effect (Stur- 
gis and Allum, 2004; Waterton and Lievesley, 
1989). For example, Sturgis and Allum (2004) 
argue that exposure to attitude questions at 
the initial interview stimulates respondents to 
think more about the topic. This crystallizes 
opinions that had not been considered up to 
that point. The stimulus strengthens or even 
changes the attitude measured at the second 
and subsequent interviews (see also Frick et al., 
2004). Interestingly, these discussions do not 
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make much of the difference between a change 
in the attitude or in how the attitude is reported. 

As in the case of conditioning and reports 
of behaviors, there is very little evidence that 
would support a learning/motivation hypothe- 
sis. However, theories of the response process 
as they apply to the context and order effects 
for attitude questions (Sudman et al., 1996; 
Tourangeau et al., 2000) are relevant when pre- 
dicting conditioning effects. For example, the 
inclusion/exclusion model views context effects 
as a function of what information is accessi- 
ble in memory to construct an answer. Pre- 
existing knowledge used to answer a question is 
called “chronically accessible.” Context effects 
occur when information temporarily added 
from exposure to a prior question is more 
influential than chronically accessible infor- 
mation. Context effects are difficult to predict 
because they depend on not only the ques- 
tions, but the relationship between chronically 
and temporarily accessible information. 

Of course conditioning is different from con- 
text effects. The latter are explained by refer- 
ring to temporarily accessible memory from 
immediately preceding questions. They are less 
likely to occur as the spacing between simi- 
lar questions increases. Nonetheless, the basic 
idea of prior questions serving as stimuli for 
generating information relevant to formulat- 
ing responses is how the learning model of 
conditioning is proposed to operate. In this 
case, the initial exposure to a question adds 
to chronically accessible information that is 
used when answering items at subsequent inter- 
views. Information may also be added through 
any stimulus effect the initial interview has 
on respondents thinking about the attitude 
object. When the re-interview occurs, respon- 
dents draw on these data and may form a dif- 
ferent judgement when compared to the prior 
interview. Different information is available for 
use when making the judgement. Note that 
this explanation does not necessarily rely on 
a respondent explicitly remembering that they 


answered the question the last time. It only 
relies on the assumption that the process of 
answering the question the first time changes 
what is eventually accessible in memory the 
next time the question is asked. 

The review in this chapter points to evi- 
dence of conditioning effects for questions 
about behavior and latent characteristics. To 
fully understand the boundaries around when 
these effects occur, it will be necessary to design 
studies that can make important distinctions 
between the competing explanations. Drawing 
on more sophisticated models of the survey 
process, as well as simulating the conditions 
surrounding panel designs, may be the most 
promising way to make progress in understand- 
ing how these effects influence the accuracy of 
panel estimates. 
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| Chapter 9 j 


Reliability issues in longitudinal 
research 
Toon W. Taris 


1 Introduction 


The reliability of a concept refers to the degree 
to which consecutive measurements of this con- 
cept yield the same result, given that the under- 
lying score on the concept has not changed. 
In this chapter we deal with various issues 
relating to the reliability of measures across 
time. In Section 3 we discuss a procedure to 
establish whether the associations among the 
items of repeatedly measured multi-item mea- 
sures has remained the same across time. If so, 
the reliability as well as the meaning of this 
concept has not changed across time—which is 
imperative if one is to make across-time com- 
parisons. Then we discuss the implications of 
measurement unreliability for across-time com- 
parisons, focusing on issues such as regression 
to the mean (the tendency for subjects with 
extreme scores to obtain scores that are closer to 
the group average at subsequent measurements) 
(Section 4), the reliability of change scores, and 
the regression fallacy, i.e., when change in the 
criterion variables that is due to measurement 
unreliability is attributed to the independent 
study variables (both in Section 5). 


2 Reliability issues in longitudinal 
research 


Classical test theory defines reliability in terms 
of the degree to which consecutive measure- 
ments of a particular concept yield the same 
result, given that the underlying score on the 
concept has not changed. The scores on any 
given measure are presumed to reflect (1) the 
score on the concept of interest (the true score), 
and (2) error, which may include bias (sys- 
tematic error, e.g., a scale that consistently 
underestimates weight by 5 kilograms, and 
which reduces validity but not reliability) and 
random error in which the error is not con- 
sistently one of overestimation or underestima- 
tion (and which reduces both reliability and 
validity). For example, a person’s score on 
an intelligence test may reflect his cognitive 
ability, but perhaps also administration mode- 
related factors (e.g., experience with computer- 
administered tests), contextual factors (noise, 
temperature), or familiarity with intelligence 
tests. Researchers usually aim to maximize 
true score variance, relative to error variance; 
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measures that largely reflect error rather than 
respondents’ true scores are useless as they can- 
not tell us much about the phenomena that 
interest us nor about their interrelations. Thus, 
researchers must have some idea of the reliabil- 
ity of their measures, as this directly affects the 
strength of the inferences that can be made on 
the basis of their study. 

Some phenomena in social, educational, and 
behavioral research can easily be measured 
reliably. For example, concepts such as partici- 
pant gender, year of birth, or level of education 
can be measured accurately with a single item. 
For such concepts the true score variance/error 
variance ratio will be acceptable, even if just 
a single item is used. However, in other cases 
this ratio will be less favorable, and in that 
case multi-item measurement is needed to 
obtain reliable estimates of the participants’ 
true scores on the phenomena of interest. 
For instance, personality traits, attitudes, 
intentions, behaviors, and well-being are often 
measured using multi-item measures. The 
scores on the separate items of these measures 
may contain a large error component, but as 
these errors are presumed to be due to random 
factors, these should largely cancel each other 
out. Multi-item measures will therefore give 
a more reliable indication of the participants’ 
true scores than single-item measures. 


Reliability estimation 

In longitudinal research, two forms of reliability 
estimation are especially important: (1) coef- 
ficients of internal consistency (taking into 
account the degree to which the components 
of the test are correlated) and (2) coefficients of 
stability (test-retest reliability). A third form of 
reliability estimation focuses on equivalence, 
ie., the degree to which measures that are 
presumed to measure the same construct are 
correlated. 


1. Internal consistency estimation starts from 
the assumption that the items belonging to a 


particular scale should be considered repli- 
cations, so that similar responses should be 
given to any pair of them. Less than per- 
fect correlation indicates that the items do 
not tap precisely the same facet or level of 
the underlying construct. Cronbach’s alpha 
coefficient is currently the most widely used 
example of this type of reliability estimate 
and focuses on the ratio of true score vari- 
ance and error variance. According to Nun- 
nally (1978), alpha should be .60 at minimum 
(and preferably exceed .70) to be acceptable, 
meaning that at the very least 37.5% of the 
variance on any given measure should reflect 
true score variance. Estimation of the inter- 
nal consistency of measures does not nec- 
essarily require a longitudinal design; alpha 
can be estimated if there are at least two 
measures (items) for the concepts of interest, 
which can easily be achieved within most 
cross-sectional studies. 

2. Estimation of test-retest stability obviously 
requires some time interval to pass between 
the test and the retest, and therefore always 
calls for a longitudinal design. As Robert 
Guion (2002) notes, the correct length of the 
interval between the test and retest depends 
on the time needed to stop remembering 
details of the “test” (e.g., the answers given 
to specific test items), and varies in practice 
from minutes to (sometimes) years (e.g., in 
the case of stable personality traits). 


Estimation of the test-retest reliability coef- 
ficient requires that the research units are 
ordered on the basis of their scores on the 
test and the retest; then the association among 
both orders is established (e.g., in terms of 
the Pearson correlation coefficient). This pro- 
cedure draws heavily on the assumption that 
the structure of the concept under considera- 
tion does not change between the test and the 
retest. That is, the within-wave pattern of asso- 
ciations among the items should be the same 
across time, and it should still be plausible that 
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these items all tap the same underlying con- 
cept. If this assumption of structural invariance 
is not warranted, it cannot be maintained that 
it is the same concept that is compared across 
time, and estimation of the test-retest stability 
is meaningless. The next section therefore dis- 
cusses a conceptual framework to examine the 
assumption of structural invariance in longitu- 
dinal research. 


3 A framework for examining 
structural stability in 
longitudinal research 


3.1 Alpha, beta and gamma change 


As suggested above, examining test-retest reli- 
ability presumes that the structure of the mea- 
sures to be correlated is the same across 
time. The issue of across-time stability is often 
addressed in terms of a set of concepts coined 
by Robert C. Golembiewski and his colleagues 
(1976). They distinguished among three types 
of change, the first of which is alpha change, 
referring to “...a variation in the level of some 
existential state, given a constantly calibrated 
measuring instrument related to a constant con- 
ceptual domain” (p. 134). Alpha change (which 
should not be confused with Lee Cronbach’s 
alpha coefficient for reliability) can be mea- 
sured in terms of the degree of change across 
occasions, e.g., regarding the average height in 
a group of teenagers or the scores on two con- 
secutive administrations of a mental ability test. 
Absence of alpha change implies that the test- 
retest reliability will be high; but, as said earlier 
on, this quantity is only meaningful insofar 
as the structure of the concept of interest has 
remained unchanged. This is what is meant 
by the reference to “a constantly calibrated 
measurement instrument” (italics added): the 
structure of the instrument should be invari- 
ant across time. For instance, if you step on a 
balance you are probably interested in know- 
ing whether you had gained or lost weight, 
relative to the previous occasion when you 


measured your weight. The measurement of 
change occurs within a fixed system of stable 
dimensions of reality (the meaning of the con- 
cept of “weight” does not change), as defined 
by an indicator whose intervals are more or less 
constant (the calibrated marks on the scale of 
the balance). 

Now imagine that the intervals between the 
marks on the scale of the balance would change 
across time. Then it would clearly be impos- 
sible to know whether you had gained or 
lost weight. Robert Golembiewski and his col- 
leagues (1976) refer to such a phenomenon 
as beta change, defined as “...a variation in 
the level of some existential state, complicated 
by the fact that some intervals of the mea- 
surement continuum associated with a concep- 
tual domain have been recalibrated” (p. 134). 
If beta change has occurred, chances are that 
the order of participants on the phenomenon 
of interest has changed, leading to low test- 
retest reliability. It is also possible that beta 
change has affected the pattern of associations 
among the items of your instrument, meaning 
that the internal consistency of your measure 
has changed across time. Of course, beta change 
regarding the scale of a balance is unlikely, but 
the scales of the balances used in social, edu- 
cational and behavioral sciences (our items and 
scales) can certainly be subject to beta change. 
For instance, you may judge the characteris- 
tics of your house (e.g., the size of the rooms, 
quality of the neighborhood) differently after 
having lived in it for two days than after two 
years; a couple in marital therapy may judge 
each other differently after a session or four, 
even if nothing has changed. Instances of beta 
change in the social sciences often involve a 
change in the perspective of the study partici- 
pants are involved in; given a clearer (or just a 
different) perception of the situation they may 
highlight different aspects of this reality, due 
to experience or maturational processes. These 
are common processes in longitudinal research, 
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meaning that there is often the potential for the 
occurrence of beta change. 

Finally, Robert Golembiewski and his col- 
leagues (1976) referred to gamma change as 
involving “...a redefinition or reconceptual- 
ization of some domain, a major change in 
the perspective of frame of reference within 
which phenomena are perceived and classi- 
fied, in what is taken to be relevant in some 
slice of reality” (p. 135). Whereas beta change 
refers to change in the intervals measuring a 
relatively stable dimension of reality, gamma 
change is a quantum shift in the conceptualiza- 
tion of this reality, manifesting itself in changes 
in the patterns of relationships among the com- 
ponents (items) of the measurement instrument 
(e.g., the number of dimensions of the measure- 
ment instrument). Clearly, across-time compar- 
isons of scores on a measurement instrument 
(including the estimation ofa test-retest reliabil- 
ity coefficient) are meaningless in the presence 
of gamma change. 


3.2 Examining alpha, beta 
and gamma change 


Golembiewski et al. (1976) not only coined the 
terms alpha, beta and gamma change, they also 
proposed to examine these types of change by 
means of factor analysis. Factor analysis refers 
to a variety of statistical techniques whose 
objective is to represent a set of observed vari- 
ables (“items”) in terms of a smaller number 
of unobserved (or “latent”) underlying vari- 
ables (also called dimensions or factors). Factor 
analysis can help researchers decide whether a 
particular set of items taps the same factor by 
providing information about the number of fac- 
tors that can reasonably be assumed to account 
for the associations among the items, which 
variables belong to a particular factor, and how 
strongly the variables are affected by (or “load 
on”) this factor. Generally speaking, a single 
factor will emerge if the associations among the 
items are about all equally high. Conversely, 
if there are multiple sets of items present that 


are highly intercorrelated among each other but 
not with items belonging to other sets of items, 
more than one factor will account for the data. 
If a single factor emerges, low intercorrelations 
among the items will translate into low factor 
loadings and a low reliability; the error variance 
on these items is large, relative to the variance 
they share (this shared variance is presumed to 
be indicative of the true score variance). Con- 
versely, high intercorrelations lead to high load- 
ings and a high internal consistency. 

In examining alpha, beta and gamma change, 
the recent literature has almost exclusively 
relied on one particular type of factor analy- 
sis, namely confirmatory factor analysis (CFA) 
as implemented in programs such as LISREL, 
EQS and AMOS. Although these programs were 
initially developed for a very specific type of 
analysis (i.e., confirming one’s a priori notions 
about the data using rigid statistical testing) 
they can also be very helpful in examining beta 
and gamma change; alpha change is usually 
tested using analysis of variance, although this 
could in principle also be achieved using the 
programs mentioned above. The two important 
virtues of CFA for examining beta and gamma 
change are that factor models can be specified 
on a high level of detail, and that statistical 
tests are available to test whether one factor 
model fits the data better than a competing fac- 
tor model. 

As regards the latter, programs for conduct- 
ing CFA all provide a range of statistics that 
can be used to determine whether a_ par- 
ticular model fits the data acceptably well. 
The best-known of these is the chi-square 
test, showing to which degree the “observed” 
variance-covariance matrix for the items resem- 
bles the “expected” variance-covariance matrix, 
ie., the matrix that is expected on the basis 
of the fitted model parameters. Large (statisti- 
cally significant) differences between these two 
matrices indicate that the model to be tested 
does a poor job in reproducing the observed 
item variances and covariances; conversely, 
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small differences suggest that the fitted model 
may well closely resemble the unknown pro- 
cess that generated the data. Further, these tests 
can be used to compare various factor models, 
e.g., a one-factor model to a two-factor model or, 
more relevant to the study of beta and gamma 
change, whether the same factor model applies 
across all time points in the study. 


CFA, gamma and beta change 

As noted earlier, current research on beta and 
gamma change is strongly rooted in the con- 
firmatory factor-analytic tradition. In this tra- 
dition, gamma change is examined in terms of 
across-time differences in the number of fac- 
tors accounting for the data and in the pattern 
of factor loadings. Beta change is measured as 
change in the magnitude of the loadings of the 
items on the factors and/or change in the vari- 
ances and covariances of the latent variables 
and observed items. Using CFA, researchers 
can assess which model accounts best for the 
data at each respective occasion. If the same 
basic model applies (i.e., the number of fac- 
tors and sets of items associated with these 
factors is the same for each occasion), parts 
of the model (e.g., the factor loadings) can be 
constrained to be equal for all occasions. Com- 
parison of the fit of the constrained model to 
that of the unconstrained model then reveals 
whether the imposed constraint is empirically 
plausible (i.e., whether the constrained part of 
the model is invariant across time; if so, no 
gamma and/or beta change has occurred) or 
not (meaning that gamma and/or beta change 
has occurred). Below I present and illustrate 
a simple three-step procedure to examine the 
invariance of factor structures across time using 
data from a two-wave study. The procedure 
can easily be generalized to multiwave stud- 
ies. Similar procedures have been proposed 
by Kenneth Bollen (1989). Finally, note that 
Adam W. Meade, Gary J. Lautenschlager and 
Janet E. Hecht (2005) address the use of modern 
item-response theory in establishing structural 
invariance. 


3.3 A three-step procedure to examine 
structural invariance 


Basically, the aim of the first step to be taken is 
to see whether the variance-covariance matrix 
for the variables (items) of interest is equal for 
both time occasions. Across-time comparison of 
the correlation matrix would be inappropriate, 
as the variances of the items in the correlation 
matrix are standardized to equal 1.00; therefore, 
the chances of finding across-time differences 
in item and/or factor variables (i.e., beta change) 
would be minimized when the correlation 
matrix would be analyzed. A significant dif- 
ference between the variance-covariance matri- 
ces indicates that (1) some form of gamma 
change has taken place; the number of dimen- 
sions or the pattern of factor loadings may 
have changed across time; (2) some form of 
beta change has occurred; the variances and/or 
covariances among the latent factors and/or the 
items have changed; or (3) a combination of 
(1) and (2) has occurred. The analysis would 
then proceed with the second step, examining 
what the precise differences across the occa- 
sions are; is the basic factor structure the same? 
If so, are the factor loadings the same? And 
so on. Conversely, if the test statistics suggest 
that there are no statistically significant differ- 
ences among the variance-covariance matrices 
for both occasions, the associations among the 
items generalize across time, meaning that 
the factor structure has remained basically 
the same. 

If the first step revealed that there are across- 
time differences in the factor structure, we must 
check in the second step whether the same 
basic factor structure applies to both occasions 
(in terms of the number of factors and the 
patterns of factor loadings). At both occasions 
a simple structure should have been reached, 
ie., the simplest model (with the fewest fac- 
tors) that still accounts reasonably well for the 
associations among the items should have been 
obtained. If the number of factors in the sim- 
ple structures is the same for both occasions, 
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we must examine whether the pattern of fac- 
tor loadings is the same across time; for struc- 
tural equivalence to hold, the same set of items 
must load at the same latent factor at both 
occasions. 

If the same basic factor structure applies to 
both occasions, the issue of beta change can be 
examined in the third step. To test whether beta 
change has occurred, the loadings of the items 
on the factors must be constrained to be equal 
across time. Comparison of the fit indexes for 
the constrained and the unconstrained model 
will then reveal which of these models fits the 
data best. If the assumption of equality of factor 
loadings across time can be retained, the equal- 
ity of the variances and covariances among the 
latent factors (the latter only if we have more 
than a single factor) across time can be exam- 
ined, again by comparing a constrained model 
with an unconstrained model. This test focuses 
on the extent to which participants see greater 
integration or differentiation of constructs from 
one occasion to another. Finally, the equal- 
ity of the error variances of the items can 
be tested. 


3.4 Illustration: the structure of newcomer 
role ambiguity 


As an illustration of the issues addressed above, 
we present a small example in which we probe 
the structural invariance of a three-item mea- 
sure of role ambiguity among newcomers in 
their first job. These newcomers were inter- 
viewed twice; once after they had been in their 
jobs for on average six months (Time 1), and the 
second time after on average 30 months after 
entering this job (Time 2). Participants were 
recruited in eight European countries, total N 
at Time 1 was 2643; of these, 1245 participants 
(47.1% response) also participated in the sec- 
ond round of data collection. At both occasions, 
all participants completed a structured ques- 
tionnaire in their respective languages, address- 
ing work attitudes, work characteristics, and 
well-being. 

One of the scales in this questionnaire tapped 
the degree to which the participants experi- 
enced role ambiguity, referring to the degree to 
which they were uncertain about several mat- 
ters relating to their jobs (Table 9.1 presents 
the scale items). It is certainly not impossible 


Table 9.1 Correlations and standard deviations (on the diagonal) of three indicators of role conflict among 
newcomers after being six months (Time 1) and 30 months (Time 2) in their first job (N = 1245) 


(1) 


(2) (3) (4) (5) (6) 


Time 1 

(1) ...knows exactly what is expected * 1.11 
(2) ...knows what s/he has to do ° 50 
(3) ...procedures for handling things ° 42 
Time 2 

(4) ...knows exactly what is expected * .30 
(5) ...knows what s/he has to do .20 
(6) ...procedures for handling things ° roll 


1,17 
43 1.14 
.20 AZ 1.17 
127 .15 .64 1.20 
12 .23 ‘53 102 1.15 


Note: All correlations significant at p < .05. 


® The full item was “On my job, I know exactly what is expected from me”. 
> The full item was “Most of the time I know what I have to do on my job”. 
© The full item was “On my job there are procedures for handling everything that comes up”. 
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that the meaning of this concept changes across 
time. After six months in their first job, new- 
comers may still be in the process of finding 
out what is actually expected from them and 
how things should be done. In contrast, after 
30 months they are reasonably experienced, 
and it would seem likely that in that stage of 
their careers they know perfectly well what to 
do, how that should be done and how possi- 
bly conflicting demands can be managed. This 
process of worker maturation may well reflect 
itself in the associations among the scale items. 
Table 9.1 presents the means, correlations, and 
standard deviations for the items of our role 
ambiguity scale. 

The within-wave reliability coefficients 
(Cronbach’s alpha) for these three items were 
.71 at Time 1 and .80 at Time 2, respec- 
tively. Although these alphas suggest that this 
three-item scale is reasonably reliable at both 
time points, it does not follow that it is the 
same phenomenon that is measured reliably 
by these items; factor loadings, factor and/or 
item variances may differ. This impression 
was confirmed through a comparison of the 
Time 1—Time 2 variance—covariance matrices. 
A test of the hypothesis that the elements 
of these matrices are the same across time 
yielded a chi-square value of 37.88 with 6 df, 
p<.05, thus rejecting this hypothesis: there 
are statistically significant differences between 
the variance-covariance matrices obtained and 
Time 1 and Time 2, meaning that there is reason 
to assume that gamma and/or beta change has 
occurred. 

In the first step we examined whether a 
single-factor model for the associations among 
the three items applied to both occasions 
(model M1 in Table 9.2). In this model, all 
three items are presumed to load on a single 
underlying latent factor; this basic factor struc- 
ture is expected to apply to both study waves. 
No constraints concerning the magnitude of the 
factor loadings, factor variances or item error 
variances are imposed. Table 9.2 shows that 


Table 9.2 Analysis of invariance across time of 
role ambiguity among newcomers (N =1245) 


Model yee df NNFP 

M1 Unconstrained 108.03 8 91 
model 

M2 Factor loadings 109.39 10 93 
constrained 
across time 

M3 Factor variances 126.40 11 .94 
constrained 
across time 

M4 Item error 146.40 14 .93 
variances 
constrained 


across time 


@ All chi-square values significant at p < .05. 
b Non-Normed Fit Index (NNFID): values of .90 and over 
signify acceptable fit. 


this model fits the data acceptably well, as 
evidenced by a non-normed fit index (NNFD 
of .91. Thus, it seems that a one-factor model 
applies to both occasions; there is no evidence 
for the presence of gamma change. 

In the second step we tested the across-time 
stability of the factor model. Model M2 exam- 
ines whether the magnitude of the factor load- 
ings is equal across time by constraining these 
factor loadings to be equal for both occasions. 
Comparison of the fit of M1 to that of M2 shows 
that the latter model results in a slightly higher 
chi-square value, whereas it has also more djs. 
The gain in degrees of freedom compensates for 
the loss of chi-square points; Chi-square,,,-Chi- 
square,,;, = 1.36 with 2 dfs extra, p>.40. Thus, 
the model in which the factor loadings are 
constrained to be equal across time (M2) fits the 
data not significantly worse than the uncon- 
strained model M1, meaning that at both occa- 
sions the three items load equally high on the 
latent factor. This is also reflected in the value 
for the NNFI that is higher for M2 than for M1. 

Similarly, we can test whether the variance of 
the latent factor is the same across time. To this 
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aim, model M3 (in which the factor variances 
are constrained to be equal across time) is com- 
pared against that of M2. In this case we find 
an increase of 17.01 chi-square points with a 
gain of only 1 df; thus, M3 fits the data worse 
than M2, p< .01. Further inspection shows that 
the factor variance increased somewhat across 
time. However, the NNFI for M3 is slightly 
higher than that for M2, suggesting that there 
are no major substantive implications of the sig- 
nificant increase in chi-square points; as our 
sample is fairly large, even small and uninter- 
esting differences between the Time 1—Time 2 
variance—covariance matrices are statistically 
significant. Thus, whereas there are some indi- 
cations that the factor variance for role ambi- 
guity increased across time, for the time being 
we assume that the factor model is essentially 
the same. 

In the final step, we examined whether the 
item error variances were the same across time. 
To this aim, model M4 (in which the item error 
variances are constrained to be equal across 
time) is compared against that of M3. This 
results in a loss of 20.00 chi-square points with 
again of only 3 dfs, p< .01. Moreover, the value 
of the NNFI decreases as compared to that of 
M3, suggesting that this statistically significant 
increase in chi-square points has substantive 
implications as well. Further inspection of the 
data shows that for all three items, the item error 
variances increased across time (which is also 
reflected in the standard deviations reported in 
Table 9.1). 

These results indicate that there is some evi- 
dence of the presence of beta change across 
time, in that the dispersion of our partici- 
pants increases across time. However, the struc- 
ture of the measurement model (basic factor 
structure and factor loadings) does not change 
across time, meaning that participant scores on 
role ambiguity can safely be compared across 
time. The fact that Cronbach’s alpha increased 
across time is due to higher intercorrelations 
among the items at Time 2; it would seem 


possible that this is caused by the increase of the 
item variances across time, as the factor load- 
ings are essentially the same for both measure- 
ment occasions. Alternatively, it is possible that 
correlated error variances due to uncontrolled 
third factors account for the higher Time 2 cor- 
relations in Table 9.1. However, it is important 
to note that our results indicate that after par- 
tialling out these error variances (as we did in 
our confirmatory factor analyses), the reliability 
of the role ambiguity concept remains the same 
across time. 


4 Regression to the mean 


One interesting phenomenon that is inter- 
twined with the reliability of measures in a 
longitudinal context is the regression to the 
mean or the regression effect. This effect was 
already noted in 1924 by the famous statisti- 
cian Edward L. Thorndike. Beyond a certain 
medium range of initial values, responses tend 
to be the reverse of what one would expect: 
extremely low initial scores will be followed 
by an increase at a follow-up study, where 
extremely high initial scores tend to be followed 
by lower follow-up scores. Regression toward 
the mean is often discussed in terms of mea- 
surement unreliability. The score on a particu- 
lar measuring instrument may not exclusively 
reflect a participant’s true score on the under- 
lying concept but other (incidental) factors as 
well; it could be that extremely high (or low) 
initial values are due to these incidental fac- 
tors. If these incidental factors are absent dur- 
ing the follow-up measurement, participants’ 
scores are likely to regress toward the actual 
“true” score on the concept of interest. 


Example: Role ambiguity among newcomers 

As an illustration, Figure 9.1 presents the Time 
1—Time 2 scores of the newcomers discussed 
in Section 3 on role ambiguity. For simplicity, 
all scores were rounded to the nearest integer. 
If regression to the mean applies, participants 
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Role ambiguity 
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Figure 9.1 Regression to the mean among a 
longitudinal sample of newcomers, with respect to 
role ambiguity 


having relatively high (low) scores at Time 1 
should have lower (higher) scores at Time 2. 
The two gray dots in Figure 9.1 present the 
Time 1 and Time 2 means, respectively. As 
high scores represent low levels of role ambi- 
guity, Figure 9.1 shows that the participants 
on average became less uncertain about what 
they had to do at their jobs. This general 
tendency applies strongly to those who felt 
extremely uncertain about their role (i.e., those 
with score 1) at Time 1; their scores are on aver- 
age almost a full point higher at Time 2 (from 
1.00 to 1.98). 

One could easily argue that especially the 
group with low Time 1 scores on role ambigu- 
ity would be likely to become less uncertain 
about their roles. For example, these partici- 
pants may ask their fellow workers or supervi- 
sor for clarification on their position or tasks. 
However, it is more difficult to understand why 
participants who were initially reasonably clear 
about their tasks would be so much more uncer- 
tain about their tasks at Time 2 (from 5.00 
at Time 1 to 3.54 at Time 2). Regression to 
the mean would be a much simpler explana- 
tion for these findings than any substantive 
interpretation. 


Example: Stability of rookie performances 
Although regression to the mean is commonly 
considered a statistical artifact that distorts 
findings in longitudinal research, this effect 
may also be of substantive interest. Jim Taylor 
and Kenneth L. Cuave (1995) collected archival 
data on the performance of major league base- 
ball hitters who had outstanding rookie seasons. 
If regression to the mean would apply (meaning 
that this excellent performance would be due to 
chance), outstanding rookies should have con- 
siderably less good second seasons. Taylor and 
Cuave (1995) found that the batting average of 
outstanding rookies declined from on average 
.300 in the first year to .276 in the sophomore 
year to .269 in the third to fifth year of their 
careers. Thus, outstanding rookie performances 
seem often the result of temporary factors that 
favorably influence their performance, meaning 
that the batting average is not a very reliable 
measure of the true qualities of rookie baseball 
players. 


Dealing with regression to the mean 

One possible strategy to deal with regression to 
the mean effects is to exclude participants with 
extreme scores on the initial measure of the 
phenomenon of interest; these are most likely 
to have their scores distorted by incidental fac- 
tors. For example, in the newcomer example 
in Figure 9.1, one would exclude participants 
with Time 1 scores of 1 and 5. The interme- 
diate scores (2-4) are probably less strongly 
influenced by regression to the mean than these 
extreme scores (although Figure 9.1 shows that 
participants with score 4 at Time 1 also expe- 
rience a major decrease in the degree to which 
they are certain about their roles). Alternatively, 
one may increase the reliability of one’s mea- 
sures (by adding items or replacing bad items— 
note that this strategy cannot be applied dur- 
ing an ongoing survey, as this would endanger 
the across-time structural equivalence of one’s 
measures). 
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5 Unreliability of change scores 
and the regression fallacy 


In longitudinal research we are usually con- 
cerned about measuring and predicting the 
amount of change in the concepts that interest 
us (i.e., the prediction and description of alpha 
change, in the terminology used in the previous 
section). One natural way of measuring across- 
time change is the difference between the scores 
obtained at two time points. For example, in 
intervention studies researches are interested 
in the effects of a particular intervention. Clin- 
ical psychologists, for instance, may attempt 
to reduce feelings of burnout among teach- 
ers by training their time-management skills; 
comparison of the pre-intervention with the 
post-intervention scores (preferably accompa- 
nied with a comparison with the scores of a 
control group) then reveals whether the inter- 
vention was effective. Similarly, if the con- 
cept of interest is income, then subtracting the 
income measured at Time 1 from the income 
measured at Time 2 represents the income gain 
(or loss) during the Time 1-Time 2 interval. 
This quantity is termed the difference score, 
gain score or change score. Intuitively attractive 
as the change score may seem, its use has gen- 
erated much concern among statisticians and 
methodologists. Prime among these concerns 
is the notorious unreliability of change scores; 
a related problem is known as the regression 
fallacy. 


The unreliability of change scores 

Statisticians have often attempted to discourage 
researchers from using change scores. Their cri- 
tique has most eloquently been put forward by 
Lee J. Cronbach and Lita Furby in their influ- 
ential 1973 paper, concluding that “there is no 
need to use measures of change as dependent 
variables and no virtue in using them” (p. 18). 
So what is the problem exactly? Assume that 


the reliabilities and variances of two repeated 
measurements of variable Y (Y, and Y,, respec- 
tively) are the same for both measurements. The 
reliability of the Y,-Y, change score is then 
given by 


1— Piz 
with p,, denoting the correlation between Y, 
and Y,, and p%. their reliability. Now, if p,, is 
positive (as is usually the case in longitudinal 
research; scores on concepts tend to be quite 
stable across time), it can be shown that the reli- 
ability of the difference score is always lower 
than the reliability of Y, and Y,. Indeed, the 
higher the intercorrelation between both mea- 
sures of Y, the lower the reliability of the dif- 
ference score becomes. This is illustrated in 
Table 9.3, showing that if the reliability of Y, 
and Y, is very low (e.g., .30), it is impossible to 
have a decent reliability for the Y,-Y, change 
score (reliabilities are considered acceptable 
when they are .60 or better; for diagnostic pur- 
poses, values of at least .70 and preferably .90 
are required). For moderate (.60) and high (.90) 
reliabilities, the correlation between Y, and Y, 
becomes relevant; acceptable reliability for the 
Y,-Y, change score is much easier to obtain 
when the correlation between the repeated mea- 
sures is low. Unfortunately, the correlations 
among subsequent measures of the same con- 
cept tend to be high, usually being in the range 


Table 9.3 Reliability of the change score as a 
function of the reliability of and intercorrelation 
among two repeated measures 


Reliability of Y, | Correlation between Y, and Y, 
and Y, 

.90 .60 30 .10 .00 
.90 .00 .50 .86 .89 .90 
.60 - .00 43 05 .60 
.30 - - .00 22 .30 
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of .50-.60 for a one-year interval (depending, 
of course, on the temporal stability of the con- 
cept under study; e.g., personality traits tend 
to be much more stable than moods). Thus, we 
often need unrealistically high reliabilities for 
the constituent variables to have a reasonably 
reliable change score. In the absence of such 
high reliability, the participants’ scores on this 
change score may be little more than random 
error. 

Of course, no one should be surprised by 
the typically low reliability of difference scores. 
The correlation between the Time 1 and Time 
2 true scores on Y (i.e., the correlation between 
Y, and Y, after correction for the measurement 
error) is high for most regions of Table 9.3, as 
can be seen after application of the disattenua- 
tion formula €,, = P,, / P+. In particular, the true 
score correlation é,, equals 1.00 along the diag- 
onal of zero reliability for the difference score. 
Clearly, the difference score approach cannot 
be expected to detect any true score-change in 
the absence of such change. This reasoning has 
fueled two approaches to increasing the relia- 
bility of the change score. First, Lee Cronbach 
(1984) argued that in order to measure the 
often small across-time true score change reli- 
ably, researchers should increase the reliabil- 
ity of their measures (e.g., by replacing bad 
items—that correlate only weakly with the 
other items of this measure—with better items 
or by increasing the number of items for their 
measures). This means that researchers will end 
up in the upper regions of Table 9.3, where it 
is easier to obtain an acceptable reliability for 
the difference score. Second, Ronald C. Kessler 
(1977) proposed that researchers make sure that 
the interval between the study waves is suf- 
ficiently large for true-score change to occur. 
This interval should be such that the differ- 
ence between two consecutive measurements 
reflects at least partly true change and not just 
random fluctuations. Also note that increasing 
the interval between two measures of a vari- 
able leads to a lower correlation between these, 


meaning that researchers end up in the right 
half of Table 9.3, where it is easier to obtain 
acceptable reliabilities for the difference score. 
Unfortunately, the practical applicability of this 
advice is rather limited, in that it is usually 
unknown what the best length for this inter- 
val is. That is, a too-long interval may mean 
that participants’ scores may change several 
times within this interval, making it difficult to 
relate the Time 1-Time 2 change to a predictor 
variable. 

The combination of difference scores and 
the regression to the mean effect discussed 
in Section 4 can yield quite misleading find- 
ings. Kent M. Jennings and Gregory B. Markus 
(1977) were interested in the effect of hav- 
ing served in the army on feelings of trust 
towards the government. One possible strategy 
would have been to compare army veterans’ 
trust scores to those of people without army 
experience. However, this simple design will 
not do as both groups may initially differ as 
regards their trust in the government. It seems 
likely that veterans put more faith in the govern- 
ment than nonveterans, or else they would not 
have chosen to join the army (the participants in 
Jennings and Markus’ study had enlisted volun- 
tarily). Thus, initial differences in trust must be 
controlled. 

To this aim, Jennings and Markus conducted 
a two-wave longitudinal study, with the first 
wave being conducted in 1965 among high 
school seniors and the second in 1973. One 
way to analyze these data is to compare the 
Time 1 trust scores with the Time 2 scores, 
i.e., to compute change scores. However, as 
the amount of change tends to be negatively 
related to initial scores due to regression to the 
mean, this approach is fraught with difficulties. 
If veterans had higher Time 1 trust scores than 
others, the former group will presumably on 
average show smaller gains than the latter group 
on the null hypothesis of no effect. We would 
therefore conclude that serving in the army has 
a deleterious effect on trust in the government 
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when, in truth, the null hypothesis is correct. 
This has been termed the regression fallacy, 
attributing the change in the criterion variables 
(that is presumably largely due to measurement 
unreliability) to the effects of the independent 
variables in the study. 

In this case, one alternative for analyzing the 
data is the regressor variable approach. Instead 
of trying to relate the Time 1-Time 2 difference 
in Y to the scores on a predictor X, the Time 
2 measure in Y is regressed on its Time 1 mea- 
sure and X. Thus, the score on Y, is treated 
here like any other explanatory variable. In this 
vein, researchers can relate the change in Y to 
other variables (note that controlling for Y, will 
partial out what is constant in Y,, thus leaving 
what has changed in Y to be accounted for by 
the other predictor variables). 


Change scores: Present and future 

At present, it appears that since the middle 
of the 1970s psychometrists’ negative attitude 
towards using change scores has tempered 
somewhat. According to K.K. Sharma and 
J.K. Gupta (1986), difference scores can be 
quite reliable under commonly present circum- 
stances, while Scott E. Maxwell and George 
S. Howard (1981) found that they may yield 
powerful tests of causal hypotheses, in spite 
of their unreliability. Similarly, Kim May and 
James B. Hittner (2003) argued that change 
scores yield powerful significance tests in that 
they reduce “true score” variance. Yet, although 
the difference score has been rehabilitated 
somewhat (see, for example, the discussions in 
Greenberg, Chapter 17, Twisk, Chapter 18, and 
Finkel, Chapter 29 in this volume), many cau- 
tious researchers will refrain from using change 
scores, if only because their readers (and espe- 
cially reviewers) of their work may still be sus- 
picious of this approach. In this sense, it may 
take a long time before we will witness renewed 
interest in using change scores in applied 
research. 


6 Concluding remarks 


The present chapter dealt with several issues 
relating to the reliability of measures that are 
applied in the context of two- or multiwave 
studies. As in single-wave (cross-sectional) 
research, it is imperative that the measures 
that are used have an acceptable reliability. If 
such is not the case, associations between vari- 
able pairs will be underestimated, leading to 
often disappointing null findings that are actu- 
ally due to imperfect measurement. Apart from 
this problem (that applies equally strongly to 
cross-sectional and longitudinal research), we 
have shown that unreliability poses even more 
threats to the validity of findings in longitudi- 
nal research. In that sense, it is often sensible 
to invest much time and effort in selecting reli- 
able measures to be included in longitudinal 
research—perhaps even more than one would 
for a cross-sectional study. 
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| Chapter 10 i 


Orderly change in a stable world: The 
antisocial trait as a chimera 
Gerald R. Patterson 


1 Introduction 


A developmental perspective implies that 
changes in social behavior are related to age 
in an orderly way. In the present report, the 
definition is expanded to include any changes 
in social behavior accompanied by explanatory 
variables that account for significant amounts 
of variance in the change score. The experi- 
ences that bring about change may or may not 
be related to the age of the child. 

By definition, the availability of longitudinal 
data sets would be prerequisite to the study of 
change in social development. What gives the 
developmental perspective credibility is that 
highly sophisticated techniques for analyzing 
longitudinal data sets have been well under- 
stood for over a decade. For example, the lucid 
descriptions of time-series analysis and panel 
analysis for inter- and intraindividual longitu- 
dinal data sets were described by Nesselroade 
and Baltes (1979). Collins and Horn (1991) 
detailed further developments in the analysis of 
change, such as latent growth modeling (LGM), 
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analysis of factor invariance, event-history anal- 
ysis, and the Guttman longitudinal simplex. 
The analytic tools are there; in fact, they have 
been there for some time. 

Given the long-standing commitment to a 
developmental perspective and 40 years worth 
of longitudinal data, one might expect that 
the construction of an empirically based the- 
ory for the development of children’s social 
behaviors would be well under way. In fact, 
there is no such accumulative data base. In the 
area of delinquency, for example, Farrington 
(1986) identified 11 well-designed longitudinal 
projects. The most salient finding to emerge was 
that measures of children’s antisocial behav- 
ior were significant predictors for adolescent 
delinquency (i.e., antisocial behavior is highly 
stable). Nevertheless, efforts to explain the sta- 
bility or to predict change in antisocial behavior 
(desistors or late starters) have not been par- 
ticularly successful (Farrington and Hawkins, 
1991). With few notable exceptions, even the 
more recent studies assiduously avoid the study 
of change. This author’s review of the empir- 
ical findings leads to the conclusion that the 
developmental emperor either has no clothes 
or, at the very least, is prone to indecent 
displays. 
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1.1 Beyond stability coefficients 


The meager returns from longitudinal stud- 
ies reflect the interactive contribution of three 
errors in research strategy. First, most investiga- 
tors have been trained within a myopic concep- 
tualization of the trait as a static or fixed unit 
that can be satisfactorily assessed by data from 
a single agent. In the discussion that follows, 
formulation of the trait concept is expanded 
to include changes over time. The measure- 
ment is based on multiagent, multimethod data. 
The second limiting factor is found in an over- 
weening reliance upon the correlation coeffi- 
cient as the analytic tool. Being wedded to 
both of these strategies leads unerringly to 
over-production of stability coefficients as the 
main output for longitudinal studies. The major 
limitation in strategy, however, lies in failure 
to include a formulation about the nature of 
change in social behaviors. This failure may 
be due to the fact that developmental theo- 
ries are exceedingly vague about what produces 
change and how to measure it. Most formu- 
lations consist of ambiguous metaphors about 
organismic variables, social learning, family 
systems, or attachment. Carrying out develop- 
mental research under the aegis of these three 
limitations is analogous to studying a dance 
through a tube. Not knowing where to look, 
we arbitrarily collect periodic data on the kind 
of shoes being worn. In so doing, we not only 
miss the point of the process, but produce yet 
another set of very high-order stability coeffi- 
cients saying that people tend to wear the same 
shoes throughout the dance. 

In the alternative perspective, the present 
report uses multiple indicators to define the 
trait concept. This strategy reflects the growing 
consensus that any single indicator measure 
would be systematically biased (Patterson, 
1992: Patterson and Bank, 1987; Sullivan, 
1974). The contemporary shift toward using 
multiple indicators and confirmatory factor 
analyses (Bentler, 1980) supposedly provides 
a basis for constructing models that are more 


generalizable. The utility of this strategy has 
been demonstrated by replicated models from 
a cross-sample (Forgatch, 1991) and across-site 
(Conger, Patterson, and Gé, 1995) studies. From 
this perspective, a trait such as antisocial behav- 
ior is embedded in a matrix of changing social 
behaviors (Patterson, Reid, and Dishion, 1992). 
Not only is the matrix changing, but some 
forms of the trait itself are changing. A study 
of this process requires having a theory about 
what produces the trait, how it will change, 
and when. Patterson, Reid, and Dishion (1992) 
detailed an empirically based theory about 
boys’ aggression. The theory specifies what 
mechanisms produce changes and, to some 
extent, when these changes should occur. These 
considerations led to an expanded definition of 
the trait score requiring time of emergence as a 
necessary piece of information. For example, an 
early emergence for the trait increases the risk 
for qualitatively new problems directly caused 
by the problem child’s coercive and antisocial 
behaviors. Literature that examines the covaria- 
tion between the trait measures and these qual- 
itative changes is reviewed in a section that 
follows. 

A core problem for a developmental theory 
of aggression is the need to explain the system- 
atic changes in form and in intensity of these 
behaviors. Changes in form occur at all stages 
of development, but particularly during ado- 
lescence. In this report, LGM is used to exam- 
ine the increment in growth for two antisocial 
behaviors (truancy and drug use), and covari- 
ates that explain why these changes for new 
forms come about are examined (Stoolmiller 
and Bank, in press). 

In this interpretation, the child’s antisocial 
trait is viewed as a chimera. According to 
biologists, a chimera is an unusual hybrid pro- 
duced by grafting tissue from different organ- 
isms. This metaphor is an apt descriptor for the 
antisocial trait. Each addition of a qualitatively 
new problem, and each change in the form of 
the coercive or antisocial behavior, might be 
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thought of as a graft made onto the original trait 
score. If changes do occur, how can we say we 
are describing the “same thing”? Analyses are 
presented demonstrating that qualitative addi- 
tions and changes in form define a second-order 
deviancy factor changing in an orderly manner 
over time. 

Before moving to a discussion of findings, one 
further problem in strategy needs to be raised. 
A frequent claim made for longitudinal design 
is that it can make an important contribution 
to evaluating causal status for developmental 
variables (Nesselroade and Baltes, 1979). Gollob 
and Reichardt (1987) extended this model to 
an autoregressive design where variable x mea- 
sured at T1 is partialed out of the measure for x 
at T2. They asserted that a panel design could be 
used to test for the causal contribution of vari- 
able y measured at T1 by demonstrating that it 
covaried significantly with future changes in x. 
Cross-lag correlations could be used to test the 
hypothesis that x measured at T1 could demon- 
strate a causal effect on y measured at T2. 
Rogosa (1979) carefully delineated problems in 
measurement and specification that typically 
make this a very weak test for causal status. He 
also strongly endorsed multiple measures and 
the use of structural equation modeling (SEM) 
rather than traditional multiple-regression anal- 
yses. Testing the causal status of parenting prac- 
tices was, in fact, one of the prime goals of 
the longitudinal Oregon Youth Study (OYS) 
(Patterson, 1988). The OYS was designed to ful- 
fill the requirements Rogosa outlined. As we 
applied the autoregressive format suggested by 
Gollob and Reichardt (1987) to our longitudi- 
nal data, however, we were immediately con- 
fronted with a paradox. Using latent constructs 
to measure child traits and parenting practices 
routinely generated stability coefficients rang- 
ing from .70 to .85 for two- to four-year intervals 
(Patterson and Bank, 1989). Correlations with 
causal constructs were usually well above the 
.40 to .50 range. Paradoxically, our efforts to use 
auto-regressive panel models to test for causal 


status were doomed to failure because our mea- 
sures were too good! Stoolmiller and Bank (in 
press) point out that the combination of high 
stabilities and high colinear relations make sig- 
nificant cross-lag effects extremely unlikely. 
They also point out that there are alternative 
analytic strategies which are more likely to 
be effective (e.g., LGM). The present author 
believes that, at best, SEM or LGM can pro- 
vide only a weak test of causal status. A devel- 
opmental theory must eventually be based on 
experimental evidence. For example, Dishion, 
Patterson, and Kavanagh (1992) and Forgatch 
(1991) have described longitudinal designs that 
included random assignment and experimental 
manipulations to test for causal status. 

Longitudinal data from the OYS are used to 
address questions about quantitative and quali- 
tative change. The 206 families involved in the 
OYS live in high-risk (for crime) neighborhoods 
in a medium-sized metropolitan area. The 
recruitment procedures and sample character- 
istics were described by Capaldi and Patterson 
(1987). Each family participated in over 20 
hours of assessment at each probe, when the 
boys were in grades 4, 6, 8, and 10. 


2 Stable but changing 


When studying changes in children’s antisocial 
behavior, we must first establish a small island 
of stability. We begin by examining two very 
different facets of stability. First, the stability of 
the definition itself is considered; it may be that 
what is meant by stability changes as the indi- 
vidual moves from childhood through adoles- 
cence. The second question concerns the means 
for estimating the magnitude of stability coef- 
ficients. There is some reason to believe that 
the typical bivariate correlation of monoagent 
reports might overestimate stability. 


2.1 Stability in definition 


It may be that both the form and the defini- 
tion of antisocial behavior change as the child’s 
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age increases. Eddy, Heyman, and Weiss (1991) 
partialed out the effect of changes in form by 
constructing a pool of antisocial behaviors that 
could occur at any age. Maternal ratings of 
these 12 items were available for toddlers (age 
27 months) and for five-year-olds. The items 
were rank-ordered at each age according to fre- 
quency of use. The correlation of .78 showed 
that items most frequently used to describe tod- 
dlers were also more likely to be used for pre- 
school children. This same set of items, when 
scored for different samples of boys in grades 2 
through 8, showed comparable stabilities in def- 
inition. For example, the rank ordering of items 
for toddlers correlated in the .71 to .75 range 
with the rank ordering of items used by moth- 
ers to describe their adolescent sons. In each 
case, the most frequently checked items tended 
to be “disobedience” and “temper tantrums,” 
whereas the least frequently checked was 
“physical aggression.” What mothers perceive 
to be “most” and “least” antisocial remains con- 
stant over child and adolescent development. 

In the Eddy et al. (1991) article, mothers’ 
stability in definitions for boys between grades 
4 and 8 was .96. For that same sample, the indi- 
vidual difference stability for mothers’ ratings 
of their sons was .65. By definition, how- 
ever, the error terms for the two sets of rat- 
ings were intercorrelated. Not only does this 
violate a fundamental assumption for appli- 
cation of correlational analysis, but it also 
inflates the magnitude of the correlation. How 
does one partial out the joint contribution 
of shared method variance and stability? As 
Rogosa (1979) and others have pointed out, the 
use of trait indicators based on reports from 
multiple agents and methods would enable 
us to disentangle the contributions of shared 
method and stability variances. This possibility 
is examined in the following section. 


2.2 Stable individual differences 


The preceding analyses showed there is a 
core set of interpersonal reactions defining the 


antisocial trait that are stable over the period 
from age 10 through age 14. Adolescent boys, 
however, are involved in some antisocial acts 
that younger boys seldom engage in, so the 
definition was expanded to include all of the 
antisocial acts that are performed often by ado- 
lescents. SEM was used to estimate the stability 
of antisocial behavior for the 206 boys in the 
OYS. The trait was defined by parent reports, 
teacher reports, and child self-reports at grades 
4 and 8. Details of the psychometric analyses 
from the grade 4 studies for each of the indica- 
tors and for the construct itself were presented 
by Capaldi and Patterson (1989). 

In the model, when the same methods were 
used at the two assessment probes, the error 
terms were allowed to covary. In doing this, 
the contribution of shared method variance can 
be removed from the estimate of stability. As 
shown in Figure 10.1, all of the factor loadings 
were highly significant at both points in time; 
in fact, the loadings showed at least configu- 
ral invariance. At both assessments, the highest 
loadings were for parent reports and the low- 
est were for child self-reports. The probability 
value of .63 for the chi-square test showed a 
solid fit between the data set and the a priori 
model. 

The traditional equation for test-retest 
correlation (1 — r?) defines the error of measure- 
ment for a given trait score. Although the cur- 
rent stability path coefficient suggests that the 
error of measurement might be about 31% of 
the variance, it is plausible that a significant 
proportion of this error estimate might reflect 
changes in level for a subgroup. A portion of the 
unexplained variance could reflect systematic 
changes in individual growth over time. In the 
next section I examine the possibility of system- 
atic changes under conditions of high stability. 


3 Two developmental models 


A formulation about late-starting delinquents 
(Patterson, DeBaryshe, and Ramsey, 1989) sug- 
gested that there would be substantial numbers 
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Figure 10.1 Stability of antisocial behavior from grade 4 to grade 8 


who become antisocial for the first time during 
midadolescence. This would result in signifi- 
cant shifts in mean level for antisocial behav- 
iors; presumably, the resulting shifts in ordinal 
rankings would in turn contribute to lowered 
stability coefficients. Intraindividual growth 
curve analyses will be used to (a) demonstrate 
that such systematic shifts do occur and (b) 
identify a set of variables thought to bring this 
about. 

It was hypothesized that early- and late-onset 
antisocial boys represent two very different 
groups. Each group has different determinants 
and outcomes. A two-parameter latent growth 
model would seem useful in examining this 
general model. For example, one parameter (the 
intercept) would define where the process starts 
(i.e., the early onset boys). The other parame- 
ter would describe how intraindividual growth 
unfolds over time (i.e., the late-onset boys). 
In the present context, the intercept describes 
the individual differences in the antisocial trait 
at age 10. The second parameter constitutes 
an operational definition of late starter (i.e., 
nonproblem children who developed antisocial 
traits in early adolescence). 

Part of the beauty of the latent growth model 
is that it operationally defines early and late 
starters. In addition, it introduces covariates 
that might account for the variance in the 


intercepts as well as the individual differences 
in growth (i.e., provides a correlation test for 
potential determinants). The covariates provide 
a direct test for the assumption that deter- 
minants for one developmental phase (early 
starters) may be different from the determinants 
for another phase (late starters). Initially, Pat- 
terson et al. (1989) hypothesized that determi- 
nants for early starters would be provided by 
contingencies embedded in family interaction; 
these in turn are controlled by the effectiveness 
of parenting practices. Latent constructs for 
parental effectiveness in discipline and mon- 
itoring practices assessed at age 10 serve as 
covariates accounting for individual differences 
in intercept scores for the OYS. 

A second parameter in the LGM, the shape 
parameter, described the differences among the 
boys in intraindividual growth patterns for 
antisocial behavior assessed at grades 4, 6, 7, 
and 8. These late starters did not begin their 
antisocial careers until early adolescence; they 
showed few, if any, adjustment problems dur- 
ing childhood and possessed at least marginal 
social and survival skills (Patterson and Yoerger, 
1993). The assumption is that some of the 
late starters will be arrested, but few will be 
chronic delinquents, and they will have a bet- 
ter prognosis than early starters for moderate 
levels of adult adjustment. The mechanisms 
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that determine the delinquent behavior for late 
starters are an earlier than normal presence on 
the street (i.e., wandering) and a heavy commit- 
ment to the deviant peer group. This formula- 
tion was based on Stoolmiller’s (1990) use of 
the OYS longitudinal data to demonstrate that, 
during the interval from childhood to midado- 
lescence, changes in antisocial behavior covar- 
ied with changes in wandering and changes in 
involvement with deviant peers. The hypothe- 
sis about the involvement of deviant peers in 
direct training for delinquent acts is tested in 
a later section. Multiple indicators are used to 
define the constructs for wandering and involve- 
ment with deviant peers that would presumably 
covary with changes in individual growth for 
antisocial behavior (Rogosa and Willett, 1985). 


3.1 Changes in antisocial behavior 


For the entire OYS sample, teachers’ ratings for 
antisocial behavior showed asignificant increase 
in mean level from grade 4 to grade 5, but 
no significant change over the next four years 
(Patterson, 1992). Current studies using parent 
ratings and child self-report data showed the 
same general pattern (i.e., essentially a non- 
significant slope for measures from grade 4 
through grade 8). The finding of no increase 
in antisocial behavior from early to midado- 
lescence is consistent with the findings from 
teacher and adolescent ratings in the Chapel Hill 
longitudinal study (Cairns and Cairns, 1991). 

In the OYS, the same assessment battery was 
used at grades 4, 6, 7, and 8. At each point 
in time, the raw scores from teacher, parent, 
and child telephone interviews were added to 
generate a single score. The same set of items 
was used at each point in time. 


3.2 A two-factor latent growth model 


The results from the simultaneous test of 
the early- and late-starter models are summa- 
rized in Figure 10.2. Using a two-factor model 
requires that the correlation between the slope 
and intercept parameters be minimal or zero. 


In fact, the data showed that the initial level 
for the antisocial score was unrelated to slope 
changes. This is a crucial piece of informa- 
tion for a developmental model of antisocial 
behavior; where a child starts is not necessar- 
ily related to his or her future growth (in mean 
level). 

The error terms for each wave of measure- 
ment for the antisocial construct were set equal. 
The intercept was defined by the antisocial 
scores obtained at grade 4; the factor loadings 
for the antisocial measures were set at 1.0 for 
each wave. Based on prior work by Stoolmiller 
(1990), the factor loadings for the shape para- 
meter were set at zero for grade 4 and at 2 for 
grades 6, 7, and 8. All the error terms for the 
covariates were allowed to covary. 

The data showed that measures collected at 
the four points in time defined the same latent 
construct for antisocial behavior. Evidence for 
this assertion lies in the r? values (adjoining 
each measured variable), which ranged from .68 
at grade 4 to .76 at grade 8. These values show 
that the latent construct loads at a very high 
level on the measures at each point in time. 

It was hypothesized that there would be a 
significant contribution by the discipline and 
monitoring constructs in accounting for initial 
level of child antisocial behavior. The findings 
were consistent with the hypothesis. Ineffective 
parental discipline (—.44) was associated with 
higher intercept scores for antisocial behavior 
even after the contribution of inept monitoring 
had been partialed out. The comparable path 
coefficient for monitoring was —.37. Together, 
the parenting practices accounted for 35% of 
the variance in the initial level of antisocial 
scores. Neither set of initial parenting skills 
covaried with late intraindividual changes in 
antisocial behavior. The results suggest that a 
single-minded focus on parenting skills is help- 
ful in understanding initial levels of aggression, 
but it does not tell us very much about which 
boys will be a risk for an increase in antisocial 
behavior at midadolescence. 
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Figure 10.2 Developmental models for early and late starters 


The next hypothesis examined in Figure 10.2 
was that the boys who showed increases in 
wandering (i.e., unsupervised street time) and 
in involvement with deviant peers would also 
show increases in antisocial behavior. Changes 
in wandering and in involvement with deviant 
peers were expressed as a simple difference 
score (Stoolmiller and Patterson, 1995). The 
difference scores showed a mean increase in 
wandering of .09 from grade 4 to grade 8 and 
a mean increase in deviant peer involvement 
of .10. The increases in wandering contributed 


significantly (.24) to the slope index for 
increases in antisocial behavior. Increases in 
deviant peer involvement contributed heavily 
(.56) to changes in antisocial behavior even after 
the contribution of increased wandering had 
been partialed out. The combined contribution 
of the two variables accounted for 43% of the 
variance in the measures (shape parameter) of 
intraindividual growth in aggression. 

The nonsignificant chi-square value demon- 
strated an acceptable fit of the data to the a 
priori early/late starter model presented in the 
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original statement (Patterson et al., 1989). The 
implication is that there are two developmental 
paths for antisocial behavior. As hypothesized, 
each path is characterized by a different set 
of covariates. The path of the parenting skills 
model relates significantly to the start point, 
childhood aggression measured at age 10. The 
path of the deviant peer involvement model 
relates to antisocial behavior that begins in early 
adolescence. 

Early and late starters seem to have differ- 
ent determinants. It was also hypothesized that 
the two groups would differ significantly in the 
timing for their first arrest and for their risk 
of chronic offending (Patterson et al., 1989). 
Presumably, antisocial children would be at 
greater risk than late starters for both early 
police arrest and for chronic arrests during ado- 
lescence. The longitudinal data collected for the 
OYS provided strong support for both hypothe- 
ses (Patterson, Crosby, and Vuchinich, 1992; 
Patterson and Yoerger, 1993). 

The findings for the late starters’ intraindivid- 
ual growth in antisocial behavior suggest that 
some of the variance not accounted for in sta- 
bility estimates may be generated by systematic 
increases in aggression for subgroups of young 
adolescents. 


4 Analyses of qualitative shifts 


It was hypothesized that over time there are two 
major qualitative changes in problem behaviors 
that accompany the antisocial trait. One shift 
is in the form of the antisocial acts; the other 
involves the addition of nonantisocial problem 
behaviors. The assumption being tested is that 
these qualitative shifts are quantifiable and that 
they define an emerging second-order deviancy 
factor. 


4.1 The addition of new problem behaviors 


Coercive and antisocial acts elicit pre- 
dictable reactions from the social environ- 
ment, adding qualitatively new problems to 


the developmental trajectory (Patterson et al., 
1992). The additions occur in an identifiable 
sequence of reactions by members of the child’s 
social environment. The sequence is initiated 
by the entrance of the antisocial child into the 
school setting. In that setting, the child’s coer- 
cive interpersonal style produces an immediate 
reaction. Coie and Kupersmidt (1983) showed 
that, in a newly formed group, other children 
began to label the antisocial child as “disliked” 
within 2 or 3 hours of contact. The second reac- 
tion to the child’s behavior is from the teacher. 
The child’s obdurate noncompliance to implicit 
and explicit rules means she or he spends less 
time on tasks when in the classroom and less 
time on homework assignments. The child’s 
academic failure is probably evident to him or 
her early on. By grades 3 or 4, the antisocial 
child has failed in two fundamentally impor- 
tant tasks: peer relations and academic skills. 
Patterson and Capaldi (1990) hypothesized that 
the effect of this dual failure is increased fre- 
quency of depressed moods. School failure, 
peer rejection, and depressed mood constitute 
a cascade of qualitative problems that add to 
the child’s burden. The link between prior 
antisocial behavior and academic failure and 
peer rejection has been examined by SEM in 
a series of analyses summarized by Patterson 
and Yoerger (1993). The relation between 
dual failure and depressed mood has been 
replicated in three SEM studies detailed by 
Patterson and Capaldi (1990) and Patterson and 
Stoolmiller (1991). 


4.2 Developmental changes in form 
of antisocial acts 


The stability studies reviewed earlier imply that 
the antisocial acts of a five-year-old may be pro- 
totypic of the acts of the delinquent adolescent. 
It is evident, however, that there are profound 
changes over time in the form of coercive (e.g., 
noncompliance, threats, temper tantrums) and 
antisocial (e.g., stealing, lying) acts. How do 
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we move from the five-year-old’s noncompli- 
ance and temper tantrums to the adolescent’s 
substance abuse, burglary, and shoplifting? We 
believe that many crucial changes in the form of 
antisocial acts occur during early adolescence, 
and the primary agents of change for these qual- 
itative shifts are members of the deviant peer 
group. During early adolescence, the training is 
fairly intensive and may involve multiple anti- 
social acts. If this is so, several new forms of 
antisocial behavior might change in a similar 
fashion over time. To test this hypothesis, data 
were examined for the changes in substance use 
and in truancy at grades 4 through 9. Substance 
use was defined by a single item, “uses alcohol 
or drugs” (never, sometimes, very often), from 
the Child Behavior Checklist (CBCL) (Achen- 
bach and Edelbrock, 1983) filled out by one or 
more teachers at each grade. The truancy vari- 
able was based on ratings for a single item, 
“skips school” (not true, sometimes true, often 
true), from the CBCL. The ratings were made 
by mothers in single-parent families and by 
both parents in intact families. For both vari- 
ables, there was about a fivefold increase from 
grade 4 to grade 9; much of the growth occurred 
between grades 7 and 8. The key hypothesis is 
that the individuals who show growth in one 
form will also be at significant risk for growth 
in the other. 

The results of the two-parameter LGM are 
summarized in Figure 10.3. The initial phases 
of the growth were characterized by a very 
high incidence of zero values. This violation 
of the assumption of normal distributed vari- 
ables is a major cause for concern. Although the 
assumption of equal error variance could not 
be met, setting the error as proportional to vari- 
ance proved successful. The probability value 
of .15 for the chi-square value of 34.59 (df = 27) 
showed an acceptable fit of the a priori model 
to the data set. 

Only a few boys showed positive initial 
scores for the truancy and substance use con- 
structs. As shown by the path coefficient of 


.34, there was a significant likelihood that these 
early starters were involved with deviant peers. 
The timing of the involvement with deviant 
peers seems to be critical, as evidenced by the 
path coefficient of .58 between early involve- 
ment (age 10) and later growth in the new forms 
of antisocial behavior: the earlier the onset, the 
greater the future growth. However, increas- 
ing involvement with deviant peers also con- 
tributed significantly to individual difference 
variance in growth. Note that the timing and the 
growth in deviant peer involvement made sig- 
nificant contributions of the same magnitude. 
Increased wandering also contributed signifi- 
cantly. Taken together, the information from the 
three covariates accounted for 54% of the vari- 
ance in the slope factor. 


4.3 Quantifying qualitative shifts 


The behaviors that define the antisocial trait 
may serve as determinants for a host of new 
problems such as peer rejection, academic 
failure, and depressed mood. New forms (e.g., 
truancy, substance abuse, police arrest) are con- 
stantly being added. This raises the question 
of whether these changes themselves form an 
orderly pattern of change over time. One way 
of thinking about this problem is to consider 
each qualitative change as contributing to a 
second-order deviancy factor that has the anti- 
social trait as its core. As each qualitatively new 
problem emerges and the prevalence becomes 
noticeable, it should appear as a newly signifi- 
cant factor loading on a second-order deviancy 
factor. At younger ages, indicators for qualita- 
tive shifts such as substance abuse and arrest 
would describe so few members of the sam- 
ple that the factor loadings would be nonsignif- 
icant. Repeated factor analyses at early and 
midadolescence should show that each qualita- 
tive shift eventually loads significantly on the 
second-order deviancy factor. Each addition is 
but a new branch of what is essentially the same 
thing. 
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Figure 10.3 A growth model for changes in form 


By way of illustration, four qualitative shift 
variables from the OYS were each measured 
at three points in time. Two of the variables— 
antisocial behavior and academic failure—were 
latent constructs; the other two—police arrests 
and teacher ratings of substance use—were 
new forms. The findings for the data col- 
lected at grades 4, 6, and 8 are summarized 
in Figure 10.4. At each point in time, the 
antisocial construct serves as the core defining 


variable; the estimated factor loadings for waves 
1 through 3 were 1.0, 1.0, and .89 respectively. 
Over time, the academic skills construct makes 
an increasing contribution to the deviancy fac- 
tor; the loadings shifted from .32 at the begin- 
ning of the process to a substantial .43, and 
then .60. The other variables representing qual- 
itative shifts (substance use and police arrest) 
also showed increased loadings on the deviancy 
factor. The data sets at all three points in time 
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Figure 10.4 Changes in the structure of deviancy over time 


provide a good fit to the model, as shown by 
the nonsignificant chi-square values. 

The findings demonstrate that the factor 
structure defining deviancy is altered signifi- 
cantly between age 10 and age 17. However, 
the second-order factor analyses suggest that 
these qualitative changes represent a pattern of 
orderly change over time. 


5 The trait as a chimera 


The antisocial trait defines an interpersonal 
style that may maximize short-term gains but 
adds to long-term increases in misery (Patterson 
et al., 1992). An understanding of the trait 
requires both a short-term and a long-term 
perspective. The second-order deviancy factor 
is one means for expressing in quantitative 
terms the nature of some of these long-term 


changes and demonstrating that the changes are 
systematic. 

The chimera metaphor implies that these 
qualitative changes are analogous to tissue 
grafts. The Greeks described the chimera as a 
fire-breathing creature that was part goat, part 
lion, and part snake. As shown in Figure 10.5, 
from a developmental perspective, the chimera 
begins as a goat and always retains its goat- 
like essence. This is analogous to seeing the 
antisocial trait as the underlying essence for 
the deviancy factor. Academic failure and peer 
rejection components constitute the addition of 
the lionesque countenance to what is essen- 
tially still a goat. By midadolescence, the addi- 
tions of substance use and police arrest produce 
an aroused society and complete the conversion 
of a simple goat to a fire-breathing monster with 
the tail of a snake. 
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Figure 10.5 The chimera effect 


6 Implications 


In summary, a developmental approach to 
the study of antisocial behavior requires not 
only longitudinal data and a theory but also 
a sensitive application of statistical analyses 
that consider intraindividual growth over time. 
The concept of growth must be expanded to 
include quantitative changes in mean level over 
time as well as qualitative changes generated 
by the same process. A trait is generalizable 
across time and across settings but, in a very 
real sense, it reflects an underlying dynamic 
process. 
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Chapter 11 I 


Minimizing panel attrition 


Heather Laurie 


1 Introduction 


This chapter examines survey nonresponse and 
attrition in the context of longitudinal designs 
where individuals may be asked to take part 
in a survey over an extended period and be 
interviewed at several points in time. Current 
practice for minimizing attrition on longitu- 
dinal surveys is reviewed and recommenda- 
tions for good practice made. While the focus 
is on household panel survey designs, where 
typically all members of the household are 
interviewed and followed over time, many 
aspects apply equally to other longitudinal sur- 
vey designs. The chapter is set out in five main 
sections. We begin by defining attrition and 
briefly discuss the potential impact of attrition 
on data quality. Section 2 discusses the impact 
of survey design features on attrition in longitu- 
dinal surveys. Section 3 describes the process of 
attrition, the techniques which are commonly 
used to maximize response and minimize attri- 
tion and what is known about the success of 
these techniques. Section 4 examines one tech- 
nique in particular by looking at existing evi- 
dence on the effect of incentives on reducing 
nonresponse and attrition over time. Section 
5 concludes with some recommendations for 
good practice in the design and conduct of 
longitudinal surveys in order to minimize 
attrition. 


1.1 Defining attrition 


When respondents drop out of a longitudinal 
survey following the first round of data collec- 
tion this is called attrition, i.e., losses to the 
sample over and above natural losses through 
death. Attrition is a process which leads to a 
cumulative reduction in the initial sample size 
over time. Attrition is also of concern if it is 
systematically related to respondent character- 
istics as it presents the possibility of attrition 
bias affecting the accuracy of our estimates in 
substantive analysis’. Sample attrition (some- 
times also called panel mortality) is therefore 
a special case of nonresponse which applies to 
longitudinal surveys and is an area where sur- 
vey organizations must develop and use addi- 
tional types of procedures to minimize losses 
to the sample. 

Attrition has a number of implications for 
data quality and the accuracy of estimates in 
analysis. The first issue is sample size, particu- 
larly for relatively small subgroups within the 
population. If cell sizes become too small the 
range of analysis possible with the data will 
be restricted. Secondly, if attrition is nonran- 
dom from a statistically selected population, 


1See Kaspryzk et al. (1989) and Brown et al. (1996) 
for a discussion of the impact of attrition bias on 
substantive analysis. 
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we can no longer be sure that the remain- 
ing sample members represent the population 
of interest. We risk what is known as differ- 
ential or systematic attrition when those who 
drop out have specific demographic or other 
characteristics, a process which could lead to 
attrition bias (Kalton et al., 1990; Pannenberg 
and Rendtel, 1996; Neukirch, 2002; Lynn and 
Clarke, 2002). This is of particular concern if 
those characteristics are associated with the 
outcome of interest. For example, if people who 
move jobs frequently have higher levels of attri- 
tion than those in stable jobs we will underes- 
timate the rate of job change in the population 
over time. 

Longitudinal analysis is interested in the 
analysis of events, transitions and changes over 
time and in estimating the likelihood of those 
transitions or events occurring for particular 
types of people. While there is some evidence 
that much of the attrition we observe tends to 
be largely random or has little effect on sub- 
stantive outcomes (Brown et al., 1996; Watson 
and Wooden, 2006), in common with nonre- 
sponse on cross-sectional surveys, there are par- 
ticular groups who are more likely to drop out 
of a longitudinal survey than others (Kalton 
et al., 1989; Groves and Couper, 1998). There 
are many statistical techniques for adjusting for 
nonresponse and attrition in panel surveys and 
once these adjustments have been made there 
may be little evidence of bias across a range of 
estimates (Kalton et al., 1989). Arguably, panel 
surveys are better placed to provide accurate 
weights due to knowledge about respondents’ 
characteristics and circumstances from earlier 
waves. However, while post-field adjustment is 
possible it is never ideal, with the key element 
in ensuring high-quality data being prevention 
of attrition in the first place. 

Attrition is caused by three main aspects of 
the longitudinal survey process: 


™ Geographical mobility which leads to a fail- 
ure to trace sample members. 


A failure to contact sample members at a 
known address. 

= Refusal to take part in further rounds of inter- 
viewing after at least one interview. 


Attrition can be defined in a number of ways 
and display different patterns ranging from 
intermittent patterns of nonresponse across a 
number of sweeps or “waves” of the survey to 
a total loss from the sample altogether”. Non- 
response at one wave may be an indicator of 
increased propensity to drop out of the survey 
altogether, but does not necessarily constitute 
attrition if the survey procedures allow the pos- 
sibility that individuals could be interviewed 
at a later wave. The implication is that anyone 
who drops out of a longitudinal survey (for rea- 
sons other than death) remains potentially eligi- 
ble for interview, even if the chances of gaining 
further interviews are low. 

For the purposes of this chapter we discuss 
minimizing attrition in terms of total losses 
to the sample due to a failure to trace and 
nonresponse due to refusals and noncontacts, 
aspects which require different approaches and 
strategies to maximize response throughout 
the survey process. While survey and field- 
work procedures are central to minimizing 
attrition, the first element to consider is how 
the survey design parameters will affect likely 
attrition rates. 


2 Longitudinal survey design 
and attrition 


Longitudinal surveys typically follow the same 
individuals over an extended period of time 
and what all longitudinal designs have in com- 
mon is that they will suffer from attrition to a 
greater or lesser extent. There are many designs 


?On household panel surveys each round of inter- 
views is typically called a “wave” while other longi- 
tudinal designs, such as cohort designs, use “sweep” 
for each round of data collection. 
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of longitudinal survey and the choice of design 
depends largely on the research questions being 
posed and the data required. If the research is 
interested in following children as they develop 
across the life course, a cohort design com- 
prising a group of children sampled at birth 
with relatively infrequent data collection may 
be appropriate. If the interest is in income and 
poverty dynamics across the whole population, 
then a panel design with a sample drawn from 
the whole population and with more frequent 
interviews may be the best design. Buck et al. 
(1996) provide a good summary of the main 
longitudinal survey design options and the suit- 
ability of each design for collecting particular 
types of data. 

Some survey designs allow for attrition in 
the original sample design, adopting some form 
of periodic sample refreshment or by using a 
rotating panel design. A rotating panel design 
is where sample members are systematically 
dropped after a certain period in the survey 
and new sample members recruited. The US 
Census Bureau’s Survey of Income and Pro- 
gramme Participation (SIPP) and the Canadian 
Survey of Labor and Income Dynamics (SLID) 
both use a rotating panel design even though 
SIPP has a relatively frequent rotation pattern 
compared to SLID. The argument for this type 
of design is that it ensures the sample remains 
representative of the population at any given 
time point in the survey. Set against this are 
the research losses due to having a limited time 
span for any given sample member, making 
research which is interested in long-term out- 
comes problematic. 

It is not the purpose of this chapter to review 
the various longitudinal survey design options 
but it is important to remember that the sur- 
vey design will impact on attrition levels in 
a number of ways. As such, minimizing attri- 
tion begins with the design decisions made at 
the outset of the survey design process. The 
procedures and resources required to maintain 
contact with sample members and encourage 
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cooperation over time should form an integral 
part of the design process. Procedures designed 
to minimize attrition should take into account 
the design parameters of the study in order to 
tailor those procedures most effectively to meet 
the needs of the survey. 


2.1 Sample design and characteristics 
of the population 


Specific characteristics of the sample popula- 
tion under study may be a factor in determining 
attrition levels (Kalton et al., 1989). The age of 
sample members, their lifestyles, family, hous- 
ing, and employment situations will all affect 
not only their propensity to respond but also 
their likelihood of moving address (Lepkowski 
and Couper, 2002; Couper and Ofstedal, 2006). 
For example, a sample of younger people will 
typically be more mobile geographically than 
a sample of the retired population and as a 
result affect the ability of the survey organiza- 
tion to maintain contact with them. Similarly, 
a cohort design, where children are sampled at 
birth or at a particular age, may foster a greater 
sense of belonging and being part of a special 
group than a general population sample and so 
increase the likelihood of continuing coopera- 
tion with the survey. The British Birth Cohort 
Studies, for example, report high response rates 
after following individuals throughout their 
life and into their 40s and 50s (Hawkes and 
Plewis, 2006). 

There is some evidence of between country 
differences in response rates, most of which 
are likely to be due primarily to differences in 
survey design and fieldwork procedures rather 
than intrinsic cultural differences (de Heer, 
1999). Watson (2003) in an analysis of attrition 
in fourteen countries in the European Commu- 
nity Household Panel Survey (1994-2001) finds 
that the retention rates varied between 82% in 
Portugal and 57% in Ireland over the first five 
waves of the survey. Attrition rates are typi- 
cally lower in the US than in Western Euro- 
pean countries. On the Panel Study of Income 
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Dynamics (PSID) in the US just over a quarter 
of the sample had been lost after eight years 
between 1968 and 1975 (Fitzgerald, Gottschalk 
and Moffat, 1998) compared to the German 
and British household panel surveys which lost 
around 34% of their original samples over the 
first eight years of these panels and the Dutch 
socioeconomic panel which suffered a loss of 
50% of the original sample over the same period 
(Watson and Wooden, 2006). 

An upward trend in attrition rates in recent 
years has been noted (de Leeuw and de Heer, 
2002). For example, the SIPP, which has a rotat- 
ing panel design with quarterly interviews con- 
ducted over eight waves (32 months), lost 31% 
of households recruited for the 1996 panel over 
the 32-month survey period compared to 22% 
for the panel recruited in 1984 (Watson and 
Wooden, 2006). 

For a face-to-face survey, the extent to which 
the sample is clustered may affect both initial 
response and subsequent attrition. The num- 
ber of clusters in the sample is important not 
only for determining design effects and effec- 
tive sample sizes but for ensuring sufficient 
coverage of interviewers across all areas being 
surveyed in a longitudinal survey. A clustered 
sample helps with the process of making the 
initial contact with households as interviewers 
can work households in the same area effi- 
ciently (Morton-Williams, 1993). As the panel 
progresses, the sample will de-cluster over time 
as households and individuals move and hav- 
ing trained interviewers covering all areas is 
important to avoid the geographically distant 
case being lost to the survey. For example, 
on the British Household Panel Survey (BHPS) 
individual movers to the remote Scottish High- 
lands and Western Isles are followed, with 
the costs of interviewer travel being factored 
into the overall survey costs (Lynn (ed.), 2006). 
Alternative strategies might include the use 
of mixed mode data collection strategies, e.g., 
using telephone interviews for remote cases as 
is done on the Household Income and Labour 


Dynamics in Australia survey (HILDA) (Watson 
and Wooden, 2004). 


2.2 Frequency of interviewing 
and perceived burden 


The frequency of interviewing set out by the 
survey design is likely to impact on attrition. 
The first aspect to consider is simply main- 
taining contact with respondents. With frequent 
interviews it is generally easier to keep in 
touch with sample members, whereas a design 
with interviews at four- or five-yearly intervals 
might result in higher levels of attrition as it is 
more difficult to maintain contact with respon- 
dents (Lepkowski and Couper, 2002; Couper 
and Ofstedal, 2006). However, as Kalton et al. 
(1989) point out, tracing respondents is time- 
consuming, so a longer interval between inter- 
views can provide more time to carry out this 
work. Set against this is the level of burden 
imposed by more frequent interviews which 
may lead to “panel fatigue” and higher levels of 
attrition (Laurie et al., 1999; Kalton and Citro, 
1993). Respondents take a view on the level 
of burden any survey implies for them, weigh- 
ing up the costs and benefits of taking part 
from their own perspective, in other words the 
“opportunity costs” of taking part (Lynn et al., 
2005; Groves and Couper,1998). Interviews con- 
ducted at three-monthly intervals may be seen 
as overly burdensome by respondents and lead 
to higher levels of attrition than a design with 
an annual or bi-annual interview, for example. 

Looking at the perceived future cost of tak- 
ing part in a longitudinal survey, Apodaca, Lea 
and Edwards (1998) found a 5% decrease in the 
response rate when respondents to the Medi- 
care Current Beneficiary Survey in the US were 
read a statement telling them the survey was 
longitudinal and they would be contacted a few 
times each year. Even though response may 
be depressed as the wave respondents are told 
that the survey is longitudinal, there is also 
some evidence that overall response rates may 
be higher at subsequent waves (Lynn, Taylor 
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and Brook, 1997). The decision about when and 
how to tell respondents they are being recruited 
to a longitudinal survey is therefore not only an 
ethical issue in terms of gaining informed con- 
sent but one which has potential implications 
for both the initial response rate and subsequent 
attrition. 


2.3 Interview length and complexity 


Interviews which are long or seen as having 
subject matter which is too personal or overly 
intrusive by respondents may lead to higher 
levels of attrition (Groves and Couper, 1998; 
Hill and Willis, 2001; Kalton et al., 1990). 
The complexity of the questionnaire and how 
easy respondents find it to answer may also 
affect attrition. As Lynn et al. (2005) note, 
interview length and complexity is something 
which respondents often say is a factor in refus- 
ing to participate at second and subsequent 
rounds of a longitudinal survey. While longer 
questionnaires are in general associated with 
respondents being less cooperative at subse- 
quent contacts, the evidence on the effect of 
questionnaire length on attrition is somewhat 
mixed. Some studies have reported a decrease 
in attrition rates after a shortening of the ques- 
tionnaire while others have found either no 
effect or even a positive effect of a longer inter- 
view (Lynn et al., 2005). On SIPP, respondents 
with a shorter interview were more likely to 
drop out at the subsequent round than those 
with a longer interview (Galvin et al., 2000). 
The relationship between interview length, 
complexity, and interest in the research topic 
is therefore likely to be a complex one, where 
a longer interview may suggest the respon- 
dent has a greater interest in the survey and is 
therefore less likely to drop out (Watson and 
Wooden, 2006). 


2.4 Saliency of topic coverage 


A high level of perceived saliency can be 
an important factor in gaining respondent 
cooperation not only at the first wave of a 
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longitudinal survey but at subsequent waves 
(Groves et al., 2000; Dillman, 2000; Lynn et al., 
2005). Saliency is the degree to which the 
respondent feels the survey is relevant to them, 
either because it is concerned with their own 
life experiences or has some intrinsic value 
for the community as a whole. In a longitudi- 
nal survey, the decision to participate in future 
waves is in part dependent on the experience of 
having taken part in a previous interview, how 
much the respondent enjoyed taking part and 
found the subject matter interesting, as well as 
how important the respondent sees the survey 
as being for the wider community or society. If 
the experience was enjoyable and the respon- 
dent felt that the questions were relevant to 
their own life, they are more likely to take part 
at later waves (Hill and Willis, 2001; Groves and 
Couper, 1998). If the research is seen as valu- 
able for the wider community, this appeals to 
respondents’ sense of altruism and civic duty 
and can increase response rates (Dillman, 2000). 
Designing the questionnaire to minimize com- 
plexity for respondents and maximize the per- 
ceived saliency of the research is therefore an 
important element in minimizing attrition and 
ensuring future cooperation. 


2.5 Mode of data collection 


The mode of data collection used will also 
affect attrition rates, with surveys conducted 
face-to-face by interviewers in respondents’ 
homes tending to have higher response rates 
and lower attrition rates than those where the 
primary means of contact is by telephone, 
post, or web survey. Mixed mode data col- 
lection is often used in longitudinal surveys 
as a means of increasing response and offer- 
ing respondents a choice of how they would 
prefer to respond (Lynn et al., 2005). The 
German Socio-Economic Panel Survey (SOEP) 
has used a mixed mode approach from the 
outset of the survey in 1984, with respon- 
dents having a choice of either a face-to-face 
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interview or returning a self-completion ques- 
tionnaire. There are also examples of surveys 
which have started with a face-to-face mode 
and shifted to either another primary mode alto- 
gether or to some combination of mixed mode 
data collection strategy. The PSID was initially 
a face-to-face survey from 1968 to 1972, mov- 
ing to primarily telephone collection in 1973 
and later introducing CATI (Computer Assisted 
Telephone Interview) technology in 1993. The 
Health and Retirement Survey (HRS) bi-annual 
interview used a mixed mode design of face- 
to-face and telephone interviews, with the ini- 
tial baseline interview being face-to-face and 
follow-up interviews, primarily telephone, a 
strategy which changed in 2004 so that the 
majority of interviews were conducted face- 
to-face JJuster and Suzman, 1995; Couper and 
Ofstedal, 2006). The National Longitudinal Sur- 
vey of Youth (NLSY) offers respondents the 
choice of telephone or face-to-face approaches 
and the BHPS and HILDA both use telephone 
interviews as part of their refusal conversion 
procedures when a face-to-face interview has 
not been possible (Burton et al., 2005; Watson 
and Wooden, 2004). HILDA also uses tele- 
phone interviews for respondents who move to 
areas outside the initial sampling points where 
sending an interviewer would be uneconom- 
ical (Watson and Wooden, 2004). The use of 
web questionnaires is widespread for consumer 
panels but they are not yet common on other 
longitudinal surveys. However, as the penetra- 
tion of access to the internet increases and the 
problems of survey nonresponse accumulate, it 
is likely to be a technique that is exploited in 
the future*. Using a mixed mode design raises 
data quality concerns about the potential for 
mode effects which may bias estimates, and 
these need to be balanced carefully against the 
possible benefits of increased response (Voogt 
and Saris, 2005). 


3Mick Couper (2000) provides a useful review of the 
use of web surveys. 


2.6 Following rules and sample 
management 


Panel surveys implement a set of rules about 
who to follow throughout the life of the sur- 
vey and how to define individuals or house- 
holds as eligible at a given wave (Kasprzyk, 
1989). These will typically cover the inclusion 
of new household members and the conditions 
under which sample members should be fol- 
lowed and remain eligible for interview. The 
following rules adopted for a given survey can 
have a direct impact on attrition in some cases. 
For example, the initial sample for a survey 
may not include people living in institutions 
but may have following rules which state that 
anyone moving into an institution remains eli- 
gible for interview and should be followed and 
interviewed. Institutions vary in terms of the 
difficulty of gaining access to respondents with 
managers or others often acting as gatekeepers, 
e.g., the manager or matron of a residential or 
nursing home for the elderly. In these types of 
circumstances attrition may therefore be higher 
due to lack of access to the respondent. 
Decisions about sample management and 
who to issue to field at each wave can have 
a marked effect on attrition rates. The issues 
are essentially to do with whether or not pre- 
vious round refusals and noncontacts should 
be issued to field for a further attempt at sub- 
sequent waves. If all refusals and noncontacts 
were automatically withdrawn from the sam- 
ple on the occasion of the first nonresponse, 
this would significantly increase attrition rates 
(Rodgers, 2002). Re-issuing previous noncon- 
tacts is not problematic but re-issuing refusals 
to field can raise ethical questions about when 
it is appropriate to ask people who have pre- 
viously refused for another interview. Many 
refusals are “situational”, i.e., there is a particu- 
lar circumstance at a given point in time which 
makes it difficult for the respondent to give an 
interview (Burton et al., 2005). This might be a 
temporary illness, bereavement, the birth of a 
new child, starting a new and stressful job, and 


Presented by: https://jafrilibrary.com 


so on. In these types of circumstances where 
the refusal is not an objection to taking part 
in the survey per se, an interview can often be 
achieved at a later date. Alternatively, the inter- 
viewer may simply have called on the house- 
hold at a bad time and a subsequent approach 
may be more successful. 

In the context of a longitudinal survey, assess- 
ing the likely combined effects of the survey 
design including the characteristics of the sam- 
ple, following rules and sample management 
decisions, mode of data collection, frequency 
of interview, perceived burden, saliency, inter- 
view length, and complexity of the question- 
naire need to be balanced in order to maximize 
response and minimize attrition at later waves 
of the survey while continuing to meet the data 
requirements of the study. 


3 The process of attrition and 
techniques for maximizing response 


The process of attrition is closely linked with 
theories of nonresponse and response propen- 
sity (Dillman, 2000; Groves and Couper, 1998; 
Lepkowski and Couper, 2002) with the key dif- 
ference that respondents, having once taken 
part in the survey, base their future decisions 
on whether or not to cooperate on that experi- 
ence. Dillman sees survey response as a social 
exchange where “... the actions of individu- 
als are motivated by the return these actions 
are expected to bring” (2000, p. 14). The three 
key elements of this exchange are rewards, cost, 
and trust, where the rewards are what the indi- 
vidual expects to gain from taking part, the 
cost is what one gives up in order to take part 
(e.g., time), and trust is the expectation that 
the rewards will outweigh the costs in the long 
run. Groves and Couper (1998) provide an alter- 
native conceptual framework for understand- 
ing the decision to either cooperate or refuse 
to take part in a survey. The influences they 
include are the social environment, character- 
istics of the household(er), survey design fea- 
tures, interviewer attributes and behavior, and 
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the interaction between the interviewer and 
householder when contact is made. Lepkowski 
and Couper (2002) posit three key steps in gain- 
ing cooperation for any survey. First you must 
locate the sample member, second you must 
contact the sample member, and finally you 
must gain the cooperation of the sample mem- 
ber. This approach is useful when thinking 
about the attrition process in longitudinal sur- 
veys as the central element of any longitudinal 
design is that you are following and interview- 
ing the same people over time. 

There are many standard techniques for 
minimizing attrition in longitudinal surveys 
and survey practitioners are always looking at 
developing new and innovative ways of main- 
taining their samples. How these techniques are 
implemented varies across surveys depending 
on the requirements of the survey design and 
the judgement of the survey practitioner about 
what is most appropriate for a given type of 
population. It is useful when designing a set of 
procedures to see them as a package of different 
elements, each of which contributes in differing 
ways to the overarching objective of minimizing 
attrition as far as possible. How these elements 
are tailored depends on the specific needs of the 
survey, with some elements being more effec- 
tive in certain circumstances than others. 


3.1 Keeping track of sample members 
and maintaining contact 


At the first interview of a longitudinal sur- 
vey the process of locating sample members is 
identical to any cross-sectional survey and will 
depend largely on the sampling frame and con- 
tact mode used for the survey. The difference in 
a longitudinal survey is the need to collect addi- 
tional information at the first wave that may 
be needed for identifying and finding sample 
members at future waves of the survey. Rates of 
moving will vary between country and also by 
the type of population being surveyed, but as a 
general guide can be expected to range between 
10% and 25% of cases a year. The propensity 
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to move varies with individual and household 
characteristics such as age and stage of the life- 
cycle, with younger people and those in the 
process of family formation being more likely to 
move, for example. Having the ability to trace 
movers to a new address is therefore an essen- 
tial prerequisite of locating them at the follow- 
ing round of interviews. For surveys using a 
face-to-face interview, much of the tracing will 
happen on the ground during fieldwork when 
interviewers discover someone has moved and 
find their new address from the current resi- 
dents of the address. Where a new address can- 
not be found by the interviewer resources for 
tracing them through other means are needed. 


Collecting additional contact details 

A first and seemingly simple procedure is 
ensuring collection of full names and titles of 
all sample members at wave 1, including those 
who may not have been interviewed, such as 
children living in a sample household. As this 
is not something which is done on most cross- 
sectional surveys on grounds of maintaining 
confidentiality for respondents, it is sometimes 
overlooked. At the first round, and at each sub- 
sequent round of interviews, collect as much 
additional contact information as possible for 
respondents themselves, including alternative 
telephone numbers such as a work number 
and mobile telephone numbers, and an email 
address if they have one. Details of at least one 
and ideally two stable contact persons, such as a 
relative or close friend, should also be collected 
including their full name, address, telephone 
number(s), and the relationship of the contact 
person to the respondent. In a design where 
multiple people are being interviewed within 
a household it is advisable to collect different 
contact names for each respondent where pos- 
sible. For example, if a couple separate at a later 
wave and one partner cannot be traced it may 
be difficult to contact their former in-laws or 
a Close friend of their former partner to get a 
new address for them. At each wave of the sur- 
vey respondents should be asked how likely it 


is they will move before the next interview as 
this can provide advance warning to the survey 
organization about possible moves in the future 
so that tracing can begin early. 


Keeping in touch exercises (KITEs) 
KITEs are where respondents are contacted 
in some way between interviews in order to 
encourage a feeling of belonging amongst sur- 
vey respondents and to gain information about 
people who have moved address, have emi- 
grated, or even died since the last interview. 
KITEs are usually carried out through a mailing, 
including a letter, a brochure of key findings, a 
change of address card, or other materials rele- 
vant for the survey. The KITE can also be used 
as a vehicle for asking respondents to confirm 
their current address and notify the survey orga- 
nization of any moves out of the household. 
Returned mail for people who have moved also 
allows the survey organization to know that 
someone has moved and is likely to need trac- 
ing before the next fieldwork period begins. 
While there is no experimental evidence on 
the effect of providing feedback to respondents 
in the form of some brief survey findings, anec- 
dotally respondents report they appreciate this 
and it is seen as a means of fostering an under- 
standing of the research amongst respondents 
(Dillman, 2000). The assumption here is that 
if respondents have evidence of the value of 
the research as well as findings that demon- 
strate some relevance to their own life, they 
will be more inclined to continue taking part. 
Some surveys also use birthday cards or Christ- 
mas or other religious festival cards as a means 
of keeping in touch and fostering loyalty to 
the survey. In Dillman’s theoretical framework 
of social exchange and his “tailored design” 
method (Dillman, 2000), providing details to 
respondents or sending thank-you letters or 
other items showing appreciation for their par- 
ticipation and providing information about how 
the data are being used is a way of reward- 
ing and developing trust amongst respondents. 
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Developing a recognizable “brand” for the sur- 
vey through the use of consistent color ways 
or survey logos on letters, leaflets, and other 
survey materials are also techniques designed 
to encourage loyalty to the survey and help 
respondents feel part of the overall project. 


Tracing during and between fieldwork 
periods 
During fieldwork where a face-to-face inter- 
view is being attempted, interviewers become 
a major resource for tracing. Interviewers are 
able to enquire on the ground from current resi- 
dents, neighbors, friends, and relations and are 
often most successful at finding new addresses 
for movers. Interviewers can attempt to con- 
tact respondents by telephone using all avail- 
able numbers (mobiles, work numbers etc.), by 
checking directory enquiries and telephoning 
contact persons given at a previous wave by the 
respondent. Where tracing on the ground fails, 
more centralized tracing procedures take over. 
Tracing respondents between fieldwork peri- 
ods to ensure that address details are as up 
to date as possible before going into the field 
is referred to as a prospective “forward trac- 
ing method” by Burgess (1989). As mentioned 
in the previous section, this is often combined 
with a KITE. In order to encourage respon- 
dents to tell the survey organization when they 
move address between interview points, a con- 
tact point, ideally a named person, with a phone 
number and email address should be provided 
to respondents. Change of address cards which 
respondents can return postage-free to notify a 
change of address are also used by most sur- 
veys, and websites designed for respondents 
can also incorporate a change of address form. 
On some surveys, respondents are given a small 
incentive for returning or notifying a change of 
address. On the BHPS respondents are paid a 
£5 incentive for notifying a change of address 
between interviews (Laurie et al., 1999) and 
PSID pay $10 to respondents who return a 
card either verifying or updating their address 
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between interview points. Couper and Ofstedal 
(2006) found that this had a significant posi- 
tive effect on tracing during the 2005 PSID data 
collection. They found that fewer than 6% of 
respondents who verified their address between 
waves needed tracing during fieldwork, com- 
pared to 30% of those who did not return the 
postcard. In addition, those who returned the 
postcard required significantly fewer calls to 
contact at the following wave than those who 
did not return the card, representing savings in 
fieldwork costs as well as tracing costs. 

If possible, it is also prudent to maintain 
a complete historical record of all previous 
addresses for sample members as previous co- 
residents of the untraced mover can be con- 
tacted and may have a new address for the 
sample member. Providing interviewers with 
the contact name details given by respondents 
at the previous interview so they can use these 
during fieldwork if they are unable to find a 
new address otherwise, can be a useful tech- 
nique. This has the advantage of reducing time 
delays as interviewers are able to attempt to 
trace some respondents without having to con- 
tact the field office which in turn frees up staff 
time for tracing the more difficult cases inter- 
viewers are unable to locate. 

Couper and Ofstedal (2006) argue that sur- 
veys should, where possible, collect systematic 
information on the steps involved in the tracing 
process and the outcomes of each step so that 
these data can be used to model the tracing pro- 
cess and better understand the effectiveness of 
the differing approaches and techniques used. 
Having better information on the tracing pro- 
cess would also allow procedures to be tailored 
and targeted for particular respondents or sit- 
uations more efficiently than is currently the 
case on most surveys. If the population being 
surveyed is from a specific demographic or 
occupational group, is particularly mobile, or 
has not been contacted for a protracted period, 
then special procedures tailored to those popu- 
lations should be developed (see, for example, 
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Wright, Allen and Devine, 1995; Menedez, 
White and Tulsky, 2001). 

The success of tracing will depend on the 
quality of the tracing information available, 
whether or not the move is long distance or not, 
and whether the move is a whole household or 
partial household move, i.e., one or more sam- 
ple members have moved but others are still 
resident at the last known address. In the UK, 
Buck (2000) found that of all movers 63% were 
short distance (within the local authority area), 
21% were medium-distance moves (between 
local authority districts in the same region) and 
the remainder were long-distance moves. In 
general, moves within the local area and par- 
tial household moves are easier to trace than 
long-distance whole household moves (Laurie 
et al., 1999). 


Tracing using public records and linking to 
administrative data sources 
There are a variety of publicly available admin- 
istrative sources which can be used to help with 
tracing respondents. These include voting regis- 
ters, vehicle or property registers, the web, and 
telephone directories, for example. Some sur- 
veys also use media appeals, something which 
was done on the UK Birth Cohort Studies (Lynn 
et al., 2005). There may also be administra- 
tive records which are not publicly available, 
such as welfare, benefit, immigration, defense, 
health, or housing records, which can be used 
for tracing respondents. This raises some ethical 
issues and it can be difficult to gain permission 
to access such data, but if it is possible it can be 
an effective means of finding sample members 
who cannot be traced in any other way. 
Having links to death registers is also valu- 
able as this enables the survey organization to 
be notified or confirm the death of a sample 
member. During fieldwork it can be difficult to 
establish whether or not a sample member is 
still alive and therefore whether or not they are 
still within the eligible population, so having 
links to death registers is helpful. Depending 


on national legal requirements, links to reg- 
ister databases of this type may require the 
permission of respondents and study protocols 
should be designed with this in mind if any 
data linkage is considered (Jenkins et al., 2004). 
Depending on the laws of privacy in a given 
country, it may also be possible to trace peo- 
ple through private or commercial sources such 
as credit records or through using private agen- 
cies such as debt collection agencies or pri- 
vate investigators. The web is also becoming 
an increasingly used resource for finding indi- 
viduals, either through search engines or using 
commercial databases. Clearly, not all of these 
avenues would be free so the choice of methods 
will depend on the survey budget as well as any 
ethical and legal constraints involved. 
Allowing an extended fieldwork period after 
the main fieldwork has been completed to 
enable tracing of difficult to find cases is an 
effective strategy. It has implications for field- 
work operations as relatively small numbers of 
cases remain active for an extended period but 
is an important strategy for minimizing losses 
and attrition bias, particularly as the most diffi- 
cult to trace respondents tend to have particu- 
lar characteristics. With suitable procedures in 
place tracing is usually fairly successful with 
location rates of over 90% for most movers. 
For example, on the PSID 18% of families 
(n = 1441) interviewed in 2003 had to be traced 
in 2005 with just 48 families not being success- 
fully traced (Couper and Ofstedal, 2006). The 
Health and Retirement Study in the US (HRS) 
had to trace 11% of respondents in 2004, with 
just 1.3% remaining untraced. Tracing rates 
are slightly lower on two of the major Euro- 
pean panel surveys. The BHPS located 94% of 
the 15% of sample members who needed trac- 
ing between 2003 and 2004 and the German 
SOEP located 96% of the 14% who needed 
tracing between 2003 and 2005 (Couper and 
Ofstedal, 2006). Most importantly, the organi- 
zational and staff resources to carry out tracing 
between and during fieldwork periods must be 
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in place, something which adds to the cost of 
carrying out a longitudinal survey but which is 
also recognized as a critical element of mini- 
mizing losses to the sample. 


3.2 Making contact 


After the first wave of a longitudinal survey, 
noncontact rates are in general fairly low with 
mature face-to-face panel surveys reporting low 
noncontact rates of less than 2% in most cases. 
Having made contact at the first wave of the 
survey, the address, names of sample members, 
and characteristics of the household are known, 
all of which increase the chances of success- 
fully contacting respondents at later waves. On 
longitudinal surveys the procedures for mak- 
ing contact include most of the standard means 
used for cross-sectional surveys. In particular 
an advance letter from the survey organiza- 
tion which presents the survey as a legitimate 
research exercise, tells respondents when to 
expect the interviewer to call, and provides con- 
tact numbers and assurances of confidentiality 
is generally seen as good practice. Specifying a 
minimum number of calls which interviewers 
must make, including the number of evening 
and weekend calls required before declaring a 
household as a noncontact, has also been shown 
to improve response rates (Morton-Williams, 
1993; Campanelli et al., 1997). Persistence does 
pay off and interviewers should keep trying 
noncontacts throughout the fieldwork period to 
increase the chances of contacting people who 
happened to be away on holiday or business in 
the earlier stages of fieldwork or who are rarely 
at home. 

One advantage of a panel or longitudinal sur- 
vey is that information about when a successful 
call was made at the previous wave is available 
and can be fed forward and given to interview- 
ers so they can plan the best time to approach 
the household. This is particularly important 
where a respondent works nights or shifts of 
some kind or has some specific requirements 
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which they have notified to the survey orga- 
nization, e.g., will only be interviewed on the 
weekend, no evening calls, must be telephoned 
in advance for an appointment, etc. Interview- 
ers should be encouraged to maintain as much 
flexibility as possible during fieldwork in order 
to fit in with the schedule of the respondent, 
even if this means making multiple calls to 
the household to interview different household 
members. Keeping records of any contacts with 
respondents between interview points so that 
these can be passed to interviewers where nec- 
essary can also help with the process of making 
contact. For example, if news of a bereave- 
ment, serious illness, or holiday dates has been 
passed to the survey organization these are use- 
ful pieces of information for the interviewer to 
know before they attempt the household and 
can increase their chances of making contact 
most efficiently. In the setting of a centralized 
telephone unit, the call pattern is determined 
by the system which again should use any pre- 
vious information to tailor the approach as far 
as possible (Bennett and Steel, 2000). 


3.3 Gaining and maintaining cooperation 


Using experienced, well-trained interviewers 
is clearly important for maximizing response 
rates on any survey. For longitudinal surveys, 
interviewers need to be trained in additional 
procedures such as tracing movers as well as 
response maximization techniques designed to 
maintain complete response histories for as 
many respondents as possible. The approach 
to the household and the ability to explain the 
purposes of the survey, to respond effectively 
to any concerns expressed by respondents and 
to convey an understanding of the importance 
of the longitudinal design are central elements 
in gaining respondent trust and commitment to 
the survey (Campanelli et al., 1997). As Groves 
and Couper (1998) argue, the interviewer must 
be able to tailor their approach to the individual 
respondent, something which is facilitated by 
having prior knowledge of the circumstances of 
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individuals and households. Interviewers must 
also be sensitive enough to recognize when 
they should retreat from a household before a 
refusal hardens (Morten-Williams, 1993). Using 
previous response histories of sample members 
to model who are most likely to refuse at the 
following wave so that targeted strategies can 
be developed for these cases can also help to 
minimize losses. 

For many respondents, the interview is seen 
as an enjoyable event and it is not uncommon 
on longitudinal surveys for respondents to be 
waiting for the interviewer to call. There is 
some evidence that maintaining continuity of 
interviewers has a positive effect on maintain- 
ing response rates. Hill and Willis (2001) found 
that having the same interviewer was the most 
significant factor predicting future response 
with roughly a 6% increase in response rates 
where this was the case. The argument is that 
interviewers and respondents build up a rap- 
port which encourages participation and that 
when the interviewer changes, this link is bro- 
ken to some extent. So there may be a loyalty 
and trust that is built on the personal rela- 
tionship between the interviewer and respon- 
dent which is independent of the level of trust 
the respondent has for the organization car- 
rying out the survey. In contrast, Campanelli 
and O’Muircheartaigh (2002) found that once 
area effects are properly controlled there is 
no interviewer continuity effect on response, 
even though significant variation in the con- 
tinuity effect remained among refusing indi- 
viduals, possibly due to unmeasured aspects 
of the interviewer such as their skill on the 
doorstep. Nonetheless, interviewer continuity 
is a strategy which is employed by most of the 
major household panel surveys which use face- 
to-face interviewing. Where telephone inter- 
views are used, this type of continuity is 
not normally possible as interviewer turnover 
rates tend to be higher within telephone units. 
Even if it were possible, it may not have any 
effect on response as the telephone contact is 


by definition less personal and more anony- 
mous than a face-to-face contact (Budowski and 
Scherpenzeel, 2005). 

As discussed in the previous section, ensur- 
ing interviewers maintain flexibility in terms of 
making multiple calls at the same household 
to complete all interviews and being prepared 
to fit in with the circumstances of individual 
respondents is important. Close monitoring of 
fieldwork progress has been shown to increase 
response rates (de Leeuw and de Heer, 2002). 
This may be due to the fact that interviewers 
are aware their performance is being monitored, 
but monitoring also allows early warning of any 
problems during fieldwork so that these can be 
addressed quickly and not left until the end of 
fieldwork when it may be too late to intervene 
in any way. Relevant information held about 
the household or respondent by the survey orga- 
nization should be provided to interviewers to 
help with the approach to the household and 
also be provided during fieldwork if the respon- 
dent notifies a change or difficulty of some 
kind, e.g., a house move or illness. Interview- 
ers should also be asked to provide their own 
assessment of how the interview went, how 
cooperative the respondent seemed to them, 
and in the case of refusals, the reasons given by 
the respondent for refusing, information which 
is important for decisions about refusal conver- 
sion and whether the sample member should 
be withdrawn from the sample or attempted at 
a later wave. 

The survey should also be designed to col- 
lect what Couper (1998) terms paradata, which 
can include information about the characteris- 
tics of the interviewer, observations of the local 
area, observations about physical barriers pre- 
venting access to the address, and details of not 
only call patterns but of the interaction with 
respondents at each call. These data can then 
be used to develop what Groves and Heeringa 
(2006) have recently termed “responsive survey 
designs” where paradata are actively used dur- 
ing data collection to assess when each phase 
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of the survey process has reached its capac- 
ity in terms of response and what strategies or 
additional features could be used to increase 
response for the remaining sample. This might 
include having the most productive interview- 
ers attempt the most difficult cases, a special- 
ized refusal conversion program or increased 
incentives for respondents, for example. 


Refusal conversion 

Refusal conversion is a common practice on 
both cross-sectional and longitudinal surveys 
but arguably is more important in the context 
of a longitudinal survey where the cumula- 
tive response rate over time is a critical quality 
indicator for the survey. Refusal conversion is 
where respondents who have initially refused 
to take part in the survey are contacted again 
during the current interviewing period to see if 
there is anything that can be done to encour- 
age participation. Respondents refuse for a vari- 
ety of reasons, some of which may be able to 
be catered for in some way. For example, if 
a respondent says that they cannot take part 
due to being too busy it may be possible to 
have the interviewer call at a time that suits 
the respondent. If the respondent did not like 
the interviewer who called on them for some 
reason or would prefer an interviewer with 
a different gender, for example, this can usu- 
ally be catered for by the survey organization. 
Refusal conversion can be carried out face-to- 
face or by other means such as telephoning the 
respondent. Refusals will be due to a range of 
reasons, from simply not being interested in 
the survey to a specific situation such as ill- 
ness or bereavement (Burton et al., 2005). Spe- 
cialist interviewers who are trained in refusal 
conversion should deal with these cases and 
having sufficient information about the circum- 
stances of the refusal will help with tailoring the 
subsequent approach. Of course a proportion 
of respondents will refuse completely to have 
any further contact and such wishes should be 
respected and the respondent withdrawn from 
the sample at that point. 
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As noted in Section 2.5 using mixed mode 
data collection can be useful for response max- 
imization and if used during refusal conversion 
can offer respondents an alternative means of 
completing the interview which may suit them 
better. Offering respondents a choice of mode 
can enable the respondent to be kept within 
the interviewed sample and there is evidence 
that such strategies are effective in maintaining 
overall sample numbers over extended time- 
frames (Burton et al., 2005). 


4 The use of incentives to minimize 
attrition 


There is an extensive literature on the use of 
incentives in cross-sectional surveys but less 
in the context of longitudinal surveys*. The 
cross-sectional evidence shows that cash incen- 
tives are effective in increasing response, even 
though this varies by survey mode and the type 
of incentive strategy used. Pre-paid monetary 
incentives given unconditionally in advance of 
the interview are the most effective in increas- 
ing response compared to a monetary incen- 
tive which is dependent on response or a gift. 
And any incentive is better than no incentive 
(Church, 1993; Singer et al., 1999; James and 
Bolstein, 1992; Couper et al., 2005). 

Incentives are more effective at increasing 
response on surveys which are burdensome 
(Lynn and Sturgis, 1997) and are also known 
to be more effective on surveys which typi- 
cally have lower response rates and where the 
saliency of the research may not be high for 
respondents (Groves et al., 2000). Incentives 
work primarily by reducing refusals and have 
little effect on noncontact rates (Singer et al., 
1999). One concern when using incentives is 
whether data collected from respondents who 


“Laurie and Lynn (2006) provide a useful sum- 
mary of what is known about incentives in both 
cross-sectional and longitudinal contexts and current 
practice on some of the major panel surveys. 
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may be less cooperative is of poor quality. How- 
ever, there seems to be little effect on data 
quality in terms of sample composition and 
response distributions where either cash or a 
gift is offered (Couper et al., 2006). Nor do mon- 
etary incentives appear to adversely affect data 
quality as measured by the levels of item nonre- 
sponse or the effort expended in the interview 
measured by the number of words given to ver- 
batim items (Singer, Van Hoewyk and Maher, 
1998; Willimack et al., 1995). 

On longitudinal surveys, in the absence of 
experimental evidence, it is difficult to disen- 
tangle the effect of the incentive from the sur- 
vey procedures discussed in previous sections, 
some of which may have significant impacts 
on response rates. Some experimental evidence 
exists but the results are somewhat mixed. 
Overall, current evidence suggests that incen- 
tives can be effective in reducing attrition over 
multiple waves of a survey, and that mak- 
ing changes through introducing an incentive, 
offering higher amounts and targeting of vari- 
ous kinds does increase response even though 
these effects vary depending on the survey con- 
text. As with cross-sectional surveys, pre-paid, 
unconditional monetary incentives are most 
effective in increasing response, an effect which 
holds across multiple waves (James, 1997; Mack 
et al., 1998). However, the incentive needs to 
be sufficiently high to reduce attrition over 
time, with some evidence that smaller mone- 
tary incentives have no effect over the longer 
term (Mack et al., 1998). Others have found 
there is a positive and enduring effect on sub- 
sequent wave response for nonmonetary incen- 
tives, where entry into a lottery was offered 
during the life of a survey (Scherpenzeel et al., 
2002). There is also some evidence of lower lev- 
els of item nonresponse where incentives are 
used on longitudinal surveys and a reduction 
in interviewer effort in terms of the number of 
calls required (James, 1997; Mack et al., 1998). 
Incentives appear to have a differential effect by 
demographic characteristics, with those on low 


incomes, with low educational qualifications 
and from ethnic minority backgrounds respond- 
ing to the incentive more than other groups 
(Mack et al., 1998). Increasing the amount of 
incentive paid during the life of a panel has 
also been shown to increase response rates, in 
part by giving a tangible signal to respondents 
that their continued participation is appreci- 
ated (Laurie and Lynn, 2006). 

Targeting strategies have been found to be 
effective, especially where previous refusals 
have been offered an incentive to take part at a 
later wave (Martin et al., 2001; Kay et al., 2001; 
Rodgers, 2002). One-off, large payments or “end 
game” payment strategies to increase response 
from the least cooperative sample members at 
the first wave of a longitudinal survey have 
also been used with some success (Juster and 
Suzman, 1995). Even though we do not know 
how successful these are in delivering long- 
term commitment to the survey, one study sug- 
gests that a large payment at the first wave 
had no effect on increasing or decreasing later 
response relative to others who initially refused 
and were persuaded to take part by other means, 
nor did the large incentive at wave 1 induce 
an expectation that large incentives would be 
offered in later waves of the panel (Lengacher 
et al., 1995). Targeting raises issues of equity 
and fairness to respondents, who may react neg- 
atively if they know that other sample mem- 
bers are receiving more than themselves, even 
though the evidence from one study suggests 
that this is not necessarily problematic (Singer, 
Groves and Corning, 1999). However, in the 
context of a household survey where all mem- 
bers are being interviewed, the use of differ- 
ential incentive payments may be problematic. 
This is an area that deserves further enquiry as 
there may be unintended consequences of per- 
ceptions of inequity and maintaining the good- 
will of survey respondents is paramount. 

There are many areas where we have lim- 
ited knowledge about the longer term effects 
of incentives even though some studies suggest 
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there may be long-term beneficial effects on 
reducing attrition and potential bias through 
using incentives. However, further experimen- 
tal work to establish the effect of incentives 
on longer term attrition, sample composition 
and data quality, the best targeting strategies 
to use, and the effect of introducing, increas- 
ing or changing the way incentives are deliv- 
ered during the life of a longitudinal survey is 
still required. The use of incentives inevitably 
incurs a direct survey cost, so the likely benefits 
in reducing attrition rates need to be weighed 
carefully against the costs. 


5 Conclusion and best practice 
guidelines 


Attrition is a concern for any longitudinal sur- 
vey, with high levels of attrition having the 
potential to significantly affect data quality and 
the long-term viability of the survey. Attrition 
rates are commonly used by data analysts and 
funders as a measure of the success of a survey 
and a critical indicator of survey quality. Any 
longitudinal survey design needs to include 
a package of procedures to minimize attrition 
together with the necessary resources required 
to implement these. While surveys vary in the 
techniques they use to minimize attrition there 
are a number of elements which should be con- 
sidered, including: 


™ the impact the survey design is likely to have 
on attrition such as the frequency of inter- 
viewing, the length and complexity of the 
questionnaire, the subject matter of the sur- 
vey, and the mode of data collection 

™ questionnaire design to minimize complexity 
for respondents, maximize saliency, main- 
tain the interest of respondents, and make the 
interview an enjoyable experience 

® tailoring survey procedures to suit the char- 
acteristics of the sample and population 
being surveyed and designing a package of 
measures to minimize attrition which are 
most appropriate to the needs of the survey 
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= the following of rules for the survey and the 
decisions about sample management, partic- 
ularly the re-issuing of previous refusals and 
noncontacts where possible as these can have 
a significant impact on attrition 

= how to introduce the purpose of the sur- 
vey to respondents, in particular the longi- 
tudinal nature of the survey and what will 
be expected of respondents who agree to 
take part 

= collection of sufficient contact details at 
the first interview to facilitate later tracing, 
including full names of all sample members, 
contact names for family or friends outside 
the household, telephone numbers including 
mobile numbers and email addresses 

™ procedures to trace respondents between 
and during fieldwork and for maintaining 
complete historical records of all previous 
addresses for sample members 

® investigating and exploiting the full range of 
tracing avenues available, including publicly 
available records such as voter registers and 
directory enquiries as well as other admin- 
istrative records and commercial databases 
where possible 

™ updating address records between interview 
points to increase the chances of making con- 
tact, reduce the amount of tracing that needs 
to be done during fieldwork, and reduce the 
number of calls required by interviewers 

m the use of Keeping in Touch Exercises 
(KITEs) between interviewing points to main- 
tain contact with respondents and a vehicle 
for them to provide information about moves 
or other changes in their circumstances (e.g., 
a recent bereavement or illness) 

m= thanking respondents for their participation 
and providing them with details of findings 
from the survey to develop trust, loyalty, and 
commitment to the survey 

= “branding” of survey materials to encourage 
a sense of belonging to the survey 

® providing respondents with a named contact 
person, telephone number, email address and 
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change of address cards so they can contact 
the survey organization with queries or notify 
changes in their circumstances 

® recording details of all contacts with respon- 
dents between interview points so that rele- 
vant information can be passed to interview- 
ers for the next round of fieldwork 

ma dedicated website for respondents with 
information about the survey and including a 
change of address form and a feedback com- 
ments field 

® collection of call record data which can be fed 
back to interviewers at the following round to 
improve the chances of making contact and 
reduce the number of calls required 

= collecting systematic data on the tracing pro- 
cess itself to allow more efficient tailoring of 
procedures 

™ ensuring interviewers are experienced and 
well trained in tracing techniques, strategies 
for contacting respondents, and maximizing 
response 

™ ensuring interviewers are flexible throughout 
the fieldwork period, prepared to fit in with 
the needs of respondents, and keep trying 
noncontacts until the end of the fieldwork 
period 

™ in a face-to-face survey, ensuring interviewer 
continuity where possible 

= setting up systems to monitor fieldwork 
closely so that any problems can be detected 
early 

= allowing an extended fieldwork period to 
trace movers and carry out refusal conversion 

m the use of mixed mode data collection strate- 
gies to maximize response 

m= a refusal conversion program during field- 
work which collects systematic data about 
the refusal conversion process so that these 
data can be used to model the most suc- 
cessful strategies for particular types of 
respondents 

=m the use of monetary and/or nonmonetary 
incentives and whether it is practical to target 
these in any way. 
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| Chapter 12 | 


Nonignorable nonresponse 
in longitudinal studies 
E. Michael Foster and Anna Krivelyova 


1 Introduction 


Attrition represents one of the most serious 
threats to both the external and internal validity 
of research in psychology and psychiatry. For 
example, the internal validity of a clinical trial 
may be compromised if those lost to follow- 
up differ systematically from those remaining 
in the study. This problem is especially acute 
if the nature of this process differs between 
the treatment and control groups. The external 
validity of the study may be threatened as well. 
Those individuals remaining in the study may 
be rather unrepresentative of the original sam- 
ple. Even if the parameter estimates have the 
interpretation we want for those participating 
(e.g., an unbiased estimate of the effect of the 
treatment), that estimate may not describe the 
experiences of the larger population. For exam- 
ple, a longitudinal evaluation may have trouble 
retaining low-income participants. In that case, 
the treatment effect estimated using those with 
complete data may not apply to the dropouts. 
(This problem would occur if economic status 
moderated the impact of the intervention or 
treatment.) 

Fortunately, missing data is an active area of 
research in statistics and social science method- 
ology (Little and Rubin, 2002; Schafer, 1997), 


and these methods are gradually influencing 
practice. Applied researchers increasingly rec- 
ognize that practices common in the past often 
have a rather dubious statistical foundation. For 
example, the practice of simply replacing miss- 
ing data with the mean distorts the relation- 
ship between that variable and other variables 
in the model or analysis. While better, condi- 
tional mean imputation (i.e., imputing the mean 
for similar individuals) has problems as well. 
That method, however, fails to account for the 
variation in the predicted value within the sub- 
groups defined by the variables used to group 
observations. 

On the other the hand, some procedures are 
appropriate in some circumstances but not oth- 
ers. For example, suppose an analyst is inter- 
ested in three variables, Y, X and Z. Suppose 
that the likelihood of missing data depends 
on all three variables (even controlling for the 
other two). In this case, regression of Y on X 
and Z using listwise deleted data would pro- 
duce estimates with desirable statistical prop- 
erties as long as the missing data mechanism is 
“missing at random” (MAR). Described in more 
detail below, MAR means that among individu- 
als with the same values of X and Z, those with 
missing and complete data do not differ in their 
values of Y. In other words, the likelihood of 
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missing data does not depend on Y conditional 
on X and Z. 

Other analyses of these same data require 
stronger assumptions. For example, suppose 
one just wanted to estimate the correlation 
between Y and X. In that case, a researcher 
using listwise deletion would have to assume 
the missing data mechanism is “missing com- 
pletely at random” (MCAR)—in effect, the 
available data would have to represent a simple 
random sample of the complete and incomplete 
data. If the likelihood of missing data depended 
on Z and Z was related to X or Y, then the 
correlation would not accurately describe the 
relationship between those two variables. 

Much of the methodological work on miss- 
ing data assumes MAR. Multiple imputation, 
for example, begins with the assumption that 
the data are missing at random (Schafer, 1997; 
1999; Schafer and Graham, 2002). Such meth- 
ods are very useful, but in some cases one sus- 
pects that the data are not MAR. In a regression 
context, MAR would not hold if the outcome 
variable differed between individuals who do 
and do not provide data conditional on the 
explanatory variables in the model. In that case, 
the missing data mechanism is said to be “miss- 
ing not at random” (MNAR). 

Efforts to assess and correct data for MNAR 
are inherently somewhat speculative. After all, 
a full assessment of MNAR would depend on 
comparisons of the outcome data for those who 
do and do not provide data. Of course, by defi- 
nition, that information is not available for the 
latter. Determining whether the data are MNAR, 
therefore, depends largely on the judgement of 
the researcher. That judgement depends on his 
or her knowledge of the outcome of interest 
and the process that shapes it. For example, 
in a longitudinal evaluation of a delinquency 
prevention program, the researcher may know 
that individuals who are incarcerated were gen- 
erally unwilling to participate in the research 


study.’ In that instance, that the respondents 
and nonrespondents would differ in terms of 
key outcomes, such as delinquency, seems quite 
likely. As we illustrate below, the data can pro- 
vide some clues about whether the data are 
MNAR, but ultimately the researcher’s judge- 
ment is the key factor. 

Statistical work in this area falls into two 
strands—selection models and pattern mixture 
models. The former is the subject of a large lit- 
erature in econometrics, and we only briefly 
review it here. Our focus here is to describe 
and illustrate the latter, pattern mixture mod- 
els. Our analyses rely on longitudinal data from 
a large evaluation of service delivery in chil- 
dren’s mental health services. 


2 MAR, pattern mixture 
and selection models 


This section begins by reviewing the technical 
definition of missing at random. 


2.1 A brief review of missing at random 


Following the notation in key texts, such as 
Little and Rubin or Schafer (Little and Rubin, 
2002; Schafer, 1997), the analyst collects data 
and observes Y,,, and a pattern of responses 
and missing data, R. The latter allows us to par- 
tition the complete data, Y, into the former and 
missing data, Y,,;... The likelihood function for 
6 and é can be written as 


Lun (9, El Yong R@ FR, Yons|6, €) (1) 


where f is the joint probability distribution for 
the observed data (Y,,,) and response indi- 
cator (R). 6 characterizes the distribution of 
Y.ns; € Characterizes the distribution of R and 


And when they do, their ability to commit addi- 
tional crimes is diminished while they are locked 
up. In that instance, the data of real interest— 
what crimes would they commit while in the 
community—is effectively missing. 
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generally represents nuisance parameters. In 
most instances, we are interested in @ and 
want an estimate that describes the behavior 
of all individuals, including those who did not 
provide data. 

One can rewrite equation (1) as 


AR. Ypgl.8) = f [(B.Y1O BAY ni (2) 


Working with the right-hand side of the equa- 
tion requires assumptions about the distribu- 
tion of the missing data (e.g., that the missing 
data have the same distribution as Y,,,). Given 
that Y,,;.. are unobserved, checking such an 
assumption is difficult or impossible, and the 
resulting parameter estimates are often rather 
sensitive to that assumption. 

Estimation is simplified if the data are “miss- 
ing at random”. In that case, 


FRY cies bene é) = f(RY ons; é) (3) 


for all values of € and Y,,;,, evaluated at the 
observed values of R and Y,,,. In that case, equa- 
tion (1.2) can be simplified to the following: 


f(R, Vopsly é) ad f(R| Poke: é)f( obs | 6) (4) 


As a result, the likelihood function can be par- 
titioned into two pieces—the first involving the 
parameter of interest 0; the second involves the 
nuisance parameter, €. Combined with an addi- 
tional assumption (that @ and € are distinct), 
inferences about 6 can ignore the missing data 
mechanism (the probability of response). In this 
case, the missing data mechanism is said to 
be “ignorable”. To be clear, in this instance, one 
can analyze the available data as if they were 
complete, at least for some purposes. 

A fair bit of confusion surrounds the meaning 
of “missing at random” in practice. MAR can 
accommodate a wide variety of missing data 
mechanisms. For example, individuals who 
do and do not provide data may differ quite 


dramatically—evidence that such differences 
exist is not evidence that MAR does not apply. 
Rather, the key issue for MAR is that the like- 
lihood of response does not depend on Y,,;«.. 
MAR requires only that the available data rep- 
resent a random sample of all values within 
subclasses defined by Y,,, (Schafer, 1997). In 
a multivariate regression, MAR means that the 
likelihood of missing data can depend on the 
covariates; in a longitudinal context, missing- 
ness may even depend on past values of the 
dependent variable. In intervention studies and 
clinical trials, the likelihood of response can 
vary with treatment status. Treatment-control 
differences in the response rate is not inherently 
problematic for analyses of treatment impact 
(Foster and Bickman, 1996). 

A key wrinkle complication surrounding 
MAR is that one’s ability to ignore the missing 
data mechanism may depend on the type and 
purposes of the analysis. For example, suppose 
that in a study of black-white differences in 
earnings, more educated workers are less likely 
to participate. In that case, descriptive statis- 
tics of mean earnings will be incorrect for both 
black and white workers. However, regression 
analyses of the between-group differences may 
be correct, as long as education is included 
as a regressor.” Within subgroups defined by 
race and education, MAR applies if those who 
respond do not differ systematically from those 
who do not. 

Furthermore, the analyst has some control 
over the applicability of MAR. A regression, for 
example, with more covariates will extend the 
reach of MAR; as more covariates are added, 
patterns of missingness related to the added 
covariates are brought under the reach of MAR. 
Intuitively, as one conditions on more and more 
variables, the likelihood that the missing data 
mechanism represents a form of random sam- 
pling within classes defined by the covariates 


2nd of course, the model for earnings is specified 
correctly. 
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increases. Similarly, a fixed-effects analysis is 
robust to a broader array of missing data mecha- 
nisms than is a random-effects estimates model 
(Verbeek and Nijman, 1992). 

As noted, however, missing at random (MAR) 
still may not apply, even when an extensive 
range of covariates have been added. In those 
instances, the unobserved values of the out- 
come of interest may directly affect the likeli- 
hood of response. Alternatively, the covariates 
may not capture completely shared predictors 
of the outcome and response. For example, in a 
longitudinal study of delinquency, individuals 
with the greatest propensity to offend may be 
less likely to provide data—they may be unwill- 
ing to provide data on criminal activities or they 
may be incarcerated (Foster, Fang and Conduct 
Problems Prevention Research Group, 2004). In 
that case, the data are said to be “missing not at 
random” (MNAR). 


2.2 Alternative MNAR approaches: 
Selection models 


Under MNAR, the likelihood function cannot 
be partitioned as 


f(R, Y10, &) = f(V18) f(RIY, &) (5) 


The likelihood depends on a model of the 
complete data (i.e., both Y,;,, and Y,,,) and 
a model predicting the likelihood of response. 
The latter depends explicitly on both the 
observed and missing values of the dependent 
variable. In terms of implementation, obtaining 
estimates 6 and é involves estimation of equa- 
tions predicting missingness and the outcome 
of interest. However, without further assump- 
tions, the model is not identified: the observed 
data could be explained by any ofa multitude of 
models of Y and of R. One identifying assump- 
tion might involve the distribution of the Y vari- 
ables (e.g., normality). Another means involves 
an exclusion restriction—a variable that affects 
response but not the outcome directly. For 
example, in a longitudinal study, one might use 


characteristics of prior interviews, such as their 
length (e.g., Lillard and Panis, 1998). 

Estimation allows for the interdependency 
between nonresponse and that outcome, either 
by allowing participation to predict the level of 
the outcome directly or, as is common in the 
econometric literature, allowing a correlation 
between the unobserved determinants of non- 
response and the outcome. Prior to the wide 
accessibility of maximum likelihood estima- 
tion, economists estimated selection models in 
two steps. First, one would estimate a probit 
equation predicting nonresponse. The results of 
that equation were used to calculate the inverse 
mills ratio, a nonlinear function of the proba- 
bility of participation. In the second stage, the 
analyst includes that ratio as an explanatory 
variable in a suitable model used to predict the 
outcome (Wooldridge, 2002, p. 550). 

In essence, selection models replace the MAR 
assumption with an alternative assumption. In 
many instances, that assumption may be more 
plausible than MAR. Regrettably, the results 
of these analyses are potentially sensitive to 
these assumptions, which are untestable. The 
best one can do is to examine the sensitivity 
of key findings to alternative parameterizations. 
Diggle and Kenward (1994) present a selec- 
tion model in a well-known example involving 
milk production and mastitis among dairy cat- 
tle. The authors allow the likelihood of miss- 
ing data to depend on the dependent variable, 
milk production. The model is identified by 
the assumed normality of the milk production 
variable. In the original analysis, the authors 
rejected MAR—they found that unobserved val- 
ues of the dependent variable affect the likeli- 
hood of response (Diggle and Kenward, 1994). 
Kenward (1998), however, determined that the 
findings were quite sensitive to distributional 
assumptions and to the handling of two out- 
lier cases. In particular, if the two cases were 
removed or an alternative distribution was used 
for the outcome of interest, a MAR model fit the 
data as well as the MNAR model fit (Kenward, 


Presented by: https: /atriiprary, AH able nonresponse in longitudinal studies 189 


1998). This sensitivity is common and should 
be carefully examined (Little and Rubin, 2002). 
For an example, see Foster, Fang and Conduct 
Problems Prevention Research Group (2004). 
Arguing that the identification assumptions are 
often obscure, statisticians often eschew selec- 
tion models and have developed pattern mix- 
ture models as an alternative. 


2.3 Alternative MNAR approaches: pattern 
mixture models 


The pattern mixture factors the likelihood func- 
tion differently than does the selection model. 
In particular, this model allows the response 
variable to depend explicitly on the pattern of 
response (rather than vice versa): 


f(R, Y|8, 6) = f(V1R, 8) FRI) (6) 


Obtaining estimates of the @ and 6 involves 
dividing observations into groups defined by 
study participation. For example, one might 
group observations according to the number 
of waves of data collection in which an indi- 
vidual participated. Then one estimates the 
outcome model of interest separately for each 
of those subgroups (i.e., conditioning on the 
response patterns). 

Advocates of this approach favor it over 
selection models not because the model does 
not depend on assumptions but because those 
assumptions are more transparent. In parti- 
cular, one estimates parameters for alternative 
groups of observations defined by their miss- 
ing data patterns. For some combinations of 
some parameters and groups of data, estimating 
the model is not possible. For example, sup- 
pose that one is analyzing the effect of study 
dropout on a dataset that offers as many as 
six waves of data (like the empirical example 
below). The substantive model of interest relies 
on a standard growth curve to explain trends 
over time in the outcome of interest. Suppose 
that the model includes a quadratic time term, 
allowing for a nonlinearity in growth over time. 


Following the pattern mixture approach, one 
would estimate the model separately for the 
six subgroups defined by the number of waves 
for which data are available. In that case, the 
quadratic time term could not be estimated for 
cases with only two waves of data. Clearly, 
some assumption needs to be made—one can 
assume that the parameter is equal to the group 
most similar (those with three waves of data) or 
the largest group (those with the most data). 

The primary disadvantage of this method is 
that the estimates for any single subgroup are 
not of special interest. Rather, the estimates of 
interest are those for the population as a whole, 
effectively requiring the analyst to combine esti- 
mates across the subgroups (Fitzmaurice, Laird 
and Shneyer, 2001). As a result, this metho- 
dology requires an additional step. 

To avoid this problem in a longitudinal 
analysis, Fitzmaurice and colleagues suggest 
an alternative method for parameterizing the 
outcome equation predicting Y. In particular, to 
capture the relationship between the outcome 
of interest and the likelihood of nonresponse, 
they suggest including the number of waves 
an individual participates in a study as a set of 
covariates, with one dummy variable represent- 
ing each of the possible number of waves. They 
demonstrate that one can center those variables 
in a way that the main effect of the other 
covariates represents the population estimate 
of the parameters of interest. The centering 
involves estimating a multinomial logit model 
predicting the number of waves an individ- 
ual participates. One then uses the resulting 
parameter estimates to calculate the predicted 
probability of participating in a given number 
of waves. A function of those predictions is 
then used to center the participation dummies. 
For details, see Fitzmaurice, Laird and Shneyer 
(2001). As a result, the predicted value for Y 
does not depend on the participation dummies 
(the expected values of which are zero). As a 
result, the parameters on the other covariates 
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(such as gender or race) have their standard 
interpretation. 


3 Empirical application: methods 


One effort to improve mental health services for 
children and youth revolves around the philos- 
ophy of “a system of care”. Key components 
of this philosophy include community-based 
alternatives to out-of-home placements, fam- 
ily involvement, cultural sensitivity, and inter- 
agency cooperation and coordination (Stroul 
and Friedman, 1996). 

A federal effort to promote system reform 
according to these principles involves the Com- 
prehensive Community Mental Health Services 
for Children and Their Families Program, estab- 
lished in 1992 by the Center for Mental Health 
Services (CMHS) in the Substance Abuse and 
Mental Health Services Administration (Center 
for Mental Health Services, 1999). This program 
has provided over $100 million to 121 commu- 
nities over the past 14 years for the develop- 
ment of local systems of care and has served over 
67,000 children and their families nationwide. 

The CMHS grants support expanded pro- 
vision of community-based, culturally sensi- 
tive services and encourage the development of 
interagency coordination. With regard to the for- 
mer, recipient communities must develop a full 
range of services, including diagnosis and eval- 
uation, case management, outpatient therapy, 
24-hour emergency services, intensive home- 
based care, intensive day treatment, respite care, 
therapeutic foster care, and transition services. 
To ensure that programs continue after federal 
funding, local sites are required to develop a 
match for federal funds; the level of matching 
funds must increase during the course of the 
grant. 

Within this rather broad framework, local 
sites can tailor their system to the strengths, 
resources, and needs of their community. As a 
result, the sites differ in terms of the age and 
needs of the children and adolescents served as 
well as the portals through which children enter 


the system (e.g., child welfare, juvenile justice, 
community mental health). They also differ in 
the services offered and in the settings in which 
services are delivered (e.g., school or home). 

The program is the focus of a national, mul- 
tisite, multicomponent evaluation, which pro- 
vides the data for this article. One critical 
evaluation component consists of a quasi- 
experimental study that matches and com- 
pares funded system-of-care communities with 
similar nonfunded communities. Our analy- 
ses here focus on two of these pairs, one 
in Nebraska and one in Alabama. The catch- 
ment area for the system-of-care grant-funded 
program in Alabama is Jefferson County (the 
Jefferson County Community Partnership) and 
includes the city of Birmingham. The matched 
comparison community is located in four 
contiguous counties that are served by the 
Montgomery Area Mental Health Authority. 
The implementation of interagency approaches 
by the Jefferson County Community Partner- 
ship includes particular focus on children with 
mental health or behavioral problems who are 
involved in the juvenile justice system. In 
Nebraska, Behavioral Health Region III is the 
system-of-care program (Nebraska Family Cen- 
tral) and is based in Kearney. In Nebraska 
Behavioral Health Region IV, the comparison 
community is based in Norfolk. Each of these 
regions covers a 22-county rural area with a 
span of approximately 15,000 square miles. 

As part of the evaluation, a sample of 939 
children and adolescents aged 4 to 17 with 
serious emotional and behavioral problems 
who were using mental health services were 
recruited for the longitudinal comparison study 
along with their caregivers. Study enrollment 
began in August 1999 and continued through 
May 2003 with follow-up data collection con- 
tinuing through May 2004. For most youth, 
entry into the study coincided with entry into 
services. For youths who had received services 
in the past, entry into the study coincided with 
a new episode of care. 
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Data on the youths’ mental health and on 
family demographics were collected through 
face-to-face interviews with caregivers and their 
children. Interviews were conducted at study 
entry and then at subsequent six-month inter- 
vals. The caregiver interviews provided a wide 
range of information on factors related to the 
need for mental health services. These charac- 
teristics included demographic information on 
the children, Medicaid enrollment status, and 
child and family risk factors, and provided the 
basis for matching across sites. The interviews 
also provided the outcomes for our analyses. 

Study participants were generally between 
the ages of 4 and 17 at recruitment. The median 
age was fairly comparable across the four 
groups defined by site and SOC status, ranging 
from 12 to 14. About one-third of participants 
were female and ranged from 27% (NE SOC) to 
44% (AL TAU). As one would expect, the racial 
and ethnic composition of the groups differed 
substantially between AL and NE. 71% and 
83% of the NE TAU and SOC groups, respec- 
tively, were non-Latino white. Those figures 
were 31% and 35% in Alabama. In NE, the non- 
whites were predominantly Latino, with some 
African-Americans and Native Americans. In 
AL, the majority of the sample was African- 
American. 

These outcomes included well-accepted mea- 
sures of the child’s mental health, such as the 
Child and Adolescent Functional Assessment 
Scale (CAFAS; Hodges, 1990; Hodges and Gust, 
1995; Hodges and Wong, 1997) and the Child 
Behavior Checklist (CBCL; Achenbach, 1978, 
1991; Achenbach and Edelbrock, 1979, 1981). 
The CAFAS assesses child functioning in eight 
domains, while the CBCL measures behavioral 
and emotional symptoms. Higher CBCL scores 
indicate more behavioral problems. A higher 
CAFAS score indicates more functional impair- 
ment. Reductions in either measure represents 
improvements in the child’s mental health. 

Data are available for as many as seven waves 
of data collection for participating youth. 


4 Empirical applications: results 


4.1 Descriptive statistics 


Table 12.1 provides descriptive statistics for the 
outcome measures. Several patterns are appar- 
ent. First, one can see that over time attrition 
is fairly high. By the seventh wave, at least 
three-fourths of the sample had attrited at each 
site. At both sites and regardless of SOC status, 
one-third of the sample was lost by the fourth 
wave. Second, attrition varies with SOC status, 
but the nature of that difference varied across 
sites. In Alabama, attrition was higher in the 
SOC site; the pattern was reversed in Nebraska. 
Third, regardless of site or SOC status, children 
showed improvement over time. 


4.2 Growth curves 


Tables 12.2 and 12.3 present mixture model 
estimates of parameters and standard errors 
using the comparison study data from Alabama 
and Nebraska. 

Model 1 is estimated assuming that dropout is 
ignorable and includes a set of basic predictors. 
SOC is an indicator that equals unity if a child 
is served in a system-of-care community. Time 
denotes a wave of data collection, and SOC X 
Time is the interaction between SOC and Time 
indicators. Model 2 assumes that dropout is 
nonignorable, and following Fitzmaurice et al., 
introduces additional controls that include the 
residuals from the first-stage multinomial logit 
and the interactions of the sum of the first-stage 
residuals with the basic predictors described 
above.* 


3 The models presented here were estimated assum- 
ing all variances and covariances to be distinct. We 
have re-estimated the models allowing a distinct 
variance for each random effect within a random- 
effects equation and restricting all covariances to be 
zero. The conclusions were not affected by the speci- 
fication of the variance-covariance matrix. The addi- 
tional results are available upon request. 
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Table 12.1 Descriptive statistics, by wave and site 


Alabama Pair 


System of Care Comparison 
CBCL % Missing CBCL % Missing 

Wave Obs Ext Int CAFAS Obs Ext Int CAFAS 

1 mean 202 69.11 61.58 3.93 0% 189 66.68 63.67 3.49 0% 
sd 10.61 10.93 1.01 11.70 11.85 1.03 

2 mean 146 66.27 59.85 3.52 28% 167 64.60 61.31 3.23 12% 
sd 11.95 11.50 1.13 11.69 12.60 114 

3 mean 111 64.03 56.19 3.21 45% 153 62.76 58.54 3.10 19% 
sd 10.73 12.28 1.19 11.92 12.99 1.09 

4 mean 80 63.99 58.06 3.21 60% 124 60.90 57.47 3.10 34% 
sd 13.94 12.59 1.32 12.07. 12.92 0.98 

5 mean 63 63.71 57.38 2.95 69% 101 60.13 56.33 3.12 47% 
sd 10.84 11.16 1.11 10.47 11.39 1.08 

6 mean 47 62.55 57.06 3.00 77% 78 56.65 53.78 3.00 59% 
sd 11.78 12.69 1.13 12.41 11.82 1.10 

7 mean 20 61.35 55.05 3.00 90% 48 56.79 51.10 2.92 75% 
sd 7.01 9.72 1.08 11.02 10.90 1.03 

Nebraska Pair 
System of Care Comparison 

1 mean 321 70.14 66.24 4.00 0% 222 66.83 62.73 3.92 2% 
sd 9.41 10.80 0.87 10.24 10.16 1.03 

2 mean 286 65.35 61.79 3.53 11% 198 62.27 58.87 3.40 12% 
sd 11.05 11.68 1.03 11.25 11.78 1.17 

3 mean 242 62.43 58.55 3.12 25% 163 59.82 57.16 3.08 28% 
sd 10.79 11.19 1.04 12.17 11.18 1.25 

4 mean 192 61.67 58.29 3.14 40% 118 59.44 56.08 3.02 48% 
sd 11.46 11.96 1.05 12.30 11.77 1.21 

5 mean 144 60.40 56.13 2.90 55% 82 59.11 54.87 3.01 64% 
sd 11.66 12.32 1.16 9.89 11.07 1.20 

6 mean 109 59.50 55.39 2.84 66% 48 59.27 54.98 2.92 79% 
sd 11.27 12.40 1.08 11.33 12.79 1.11 

7 mean 88 58.35 53.43 2.84 73% 29 55.66 50.07 2.50 87% 
sd 12.41 13.66 1.19 9.21 11.78 1.17 
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Table 12.2 Alabama comparison study: estimates of treatment effects 


ALABAMA 
CBCL Internalizing CBCL Externalizing CAFAS 
Parameter M1 M2 M1 M2 M1 M2 
Intercept 65.10*** 64.82*** 67.77*°** 67.53*** 3.41" 3.48** 
(0.90) (0.98) (0.87) (0.93) (0.08) (0.09) 
SOC —3.18** —2.74** 1.95 2.36* 0.63*** 0.55*** 
(1.29) (1.39) (1.23) (1.33) (0.12) (0.12) 
Time —1.83** —1.72"" —1.45*** =1,22"" —0.04** —0.03 
(0.21) (0.26) (0.18) (0.23) (0.02) (0.02) 
SOC X Time 0.66* 0.42 —0.09 —0.43 —0.16*** —0.17** 
(0.34) (0.38) (0.29) (0.33) (0.03) (0.04) 
D*, —0.48 0.55 0.92*** 
(2.42) (2.34) (0.21) 
D*; —0.37 2.55 0.61** 
(2.48) (2.43) (0.22) 
D*, 1.52 1.97 0.68*** 
(2.28) (2.23) (0.20) 
D*, —1.21 0.49 0.58** 
(2.40) (2.37) (0.21) 
D*, 1.27 2.24 0.72** 
(2.42) (2.39) (0.21) 
Dy 1.20 1.47 0.49** 
(2.31) (2.26) (0.20) 
D*X SOC 1.45 —1.02 —0.43 
(3.46) (3.35) (0.31) 
D*X SOC X Time —0.82 —1.06* —0.06 
(0.72) (0.61) (0.07) 
D*X Time 0.32 0.52 —0.02 
(0.44) (0.37) (0.04) 
Note: standard errors are in parentheses. 


* significant at 10% level 
* sionificant at 5% level 
** sionificant at 1% level 
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Table 12.3 Nebraska comparison study: estimates of treatment effects 


* significant at 10% level 
** sionificant at 5% level 
** sionificant at 1% level 


NEBRASKA 
CBCL Internalizing CBCL Externalizing CAFAS 
Parameter M1 M2 M1 M2 M1 M2 
Intercept 64.00*** 64.09*** 67.59" 67.76" 3.96" 3.96" 
(0.78) (0.81) (0.75) (0.77) (0.07) (0.08) 
SOC 2.96** 2.91°* 3.03*°* 3.23" 0.09 0.14 
(1.01) (1.06) (0.96) (1.01) (0.09) (0.10) 
Time —2.09*** —2.15** —2.09** —2.22*" —0.22** —0.23*" 
(0.22) (0.23) (0.21) (0.22) (0.02) (0.02) 
SOC X Time —0.11 —0.10 —0.03 —0.09 0.00 —0.02 
(0.27) (0.31) (0.27) (0.30) (0.03) (0.03) 
0, 1.30 4,57" a7 
(2.54) (2.41) (0.23) 
ia 1.37 4.59** 0.73 
(2.36) (2.25) (0.21) 
iD —0.16 1.41 0.56*** 
(2.31) (2.19) (0.21) 
D*, 0.61 2.94 0.51** 
(2.35) (2.23) (0.21) 
D*, 2.63 5.22" 0.70*** 
(2.39) (2.27) (0.21) 
IDs 0.64 3.54 0.64*** 
(2.57) (2.45) (0.23) 
D*X SOC —1.74 —2.53 —0.53** 
(2.56) (2.43) (0.23) 
D*X SOC X Time 0.31 0.62 0.06 
(0.57) (0.56) (0.06) 
D*X Time —0.40 —1.15* —0.13*** 
(0.47) (0.45) (0.05) 
Note: standard errors are in parentheses. 
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The results indicate that accounting for non- 
random dropout rates does have a slight effect 
on the estimates but does not result in sub- 
stantive changes in the conclusions. Analysis 
of data from Alabama sites (n = 391) revealed 
that controlling for nonrandom dropout rates 
reduces the estimates of the coefficients asso- 
ciated with Time for all three outcomes. When 
the dependent variable was a CAFAS score, 
the time effect that had been marginally sig- 
nificant in the basic model, became insignifi- 
cant. The Model 1 estimates of treatment effects 
(SOC X Time) indicated that CBCL Internal- 
izing Problems scores improved faster in the 
control community. Accounting for nonrandom 
dropout rates reduced the value of the treatment 
effect and resulted in no statistically signif- 
icant differences between the two communi- 
ties. Though no significant treatment effects on 
CBCL externalizing scores were detected, the 
magnitude of a coefficient increased substan- 
tially (from —0.09 to —0.43) under Model 2. 
With CAFAS score as a dependent variable, the 
estimates of both models indicated that chil- 
dren served in a system-of-care improved sig- 
nificantly faster than children served in a more 
traditional setting. 

In contrast to the findings for the Alabama 
sites, analysis of data from Nebraska (n = 
543) revealed that controlling for nonrandom 
dropout rates increases the estimates of the 
coefficients associated with Time for all three 
outcomes. Both Model 1 and Model 2 estimates 
of treatment effects showed no significant dif- 
ferences between the sites in rates of improve- 
ment in either of the three outcomes. 


5 Discussion 


Nonignorable nonresponse remains a challenge 
for methodologists and applied researchers 
alike. Fortunately, the range of potential reme- 
dies continues to expand in light of both 
theoretical and computational advances. As 


discussed, however, no method is likely to pro- 
duce the “final” or “best” answer in the near 
term. Rather, the alternative methods represent 
a range of plausible solutions, and researchers 
can only rely on their judgement to select an 
overall strategy (e.g., pattern mixture vs selec- 
tion) or a specification of a strategy (e.g., which 
interaction terms to include in the pattern mix- 
ture model). 

The best approach, therefore, likely involves 
estimating key model parameters under alterna- 
tive sets of assumptions. When the results of the 
analysis are invariant to the handling of missing 
data, as seems to be the case here, the analyst is 
left with a rather tidy situation. More challeng- 
ing are situations where the different methods 
produce different estimates of key parameters. 
In that case, the analyst is left to pick among 
the various models according to which set of 
assumptions are most tolerable. 
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| Chapter 13 J 


Graphical techniques for exploratory 
and confirmatory analyses 
of longitudinal data 


Garrett M. 


1 Introduction 


The very first steps in an analysis of lon- 
gitudinal data usually include an examina- 
tion of simple descriptive statistics, with the 
goal of obtaining some insights about the pat- 
terns of change in the response over time. 
Although these descriptive statistics can be pre- 
sented in a table, for comparative purposes a 
graphical display of the same information is 
usually far more revealing. Graphical displays 
are extraordinarily useful techniques for con- 
veying information about the most salient fea- 
tures of longitudinal data. These graphical tools 
can provide insights about patterns of change 
in the mean response over time (e.g., linearity 
or the lack thereof) and the choice of suitable 
functional forms for covariates. Ordinarily, a 
graphical assessment of longitudinal data pre- 
cedes any formal statistical analyses. This pre- 
liminary aspect of longitudinal data analysis is, 
for the most part, exploratory in nature. Graph- 
ical techniques also play an important role in 
the concluding stages of longitudinal data anal- 
ysis. A final statistical analysis of longitudi- 
nal data is not complete without an assessment 
of the adequacy of the fitted model; the latter 


Fitzmaurice 


often involves a graphical examination of resid- 
uals. Plots of residuals are especially helpful 
for model checking in the confirmatory stages 
of the analysis. Plots of the residuals are use- 
ful not only for revealing systematic trends but 
also for highlighting anomalies (e.g., potential 
outliers). In this chapter, we will focus on some 
graphical techniques commonly used for these 
two important and complementary aspects of 
longitudinal data analysis. However, before dis- 
cussing any particular technique, we introduce 
two examples that will be used to illustrate 
the application of these graphical methods. The 
first example is from a randomized longitudinal 
clinical trial, the second is from an observa- 
tional study. 


1.1 Treatment of Lead-Exposed Children 
(TLC) Trial 


It is now well-established that exposure to lead 
can produce cognitive impairment, especially 
among young children and infants. Although 
the use of lead as an additive in gasoline has 
been discontinued, at least in the United States, 
resulting in a dramatic reduction in airborne 
lead levels, a small percentage of children con- 
tinue to be exposed to lead at levels that can 
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produce impairment. Much of this exposure is 
due to chipping and peeling lead-based paint 
in older homes. Lead paint chips and lead- 
contaminated paint dust is ingested by young 
children during normal teething and hand-to- 
mouth behavior. The United States Centers for 
Disease Control and Prevention (CDC) has con- 
cluded that children with blood lead levels 
above 10 micrograms per deciliter (wg/dL) of 
whole blood are at risk of adverse health effects. 

Fortunately, lead poisoning in children is 
treatable in the sense that there are medical 
interventions, known as chelation treatments, 
that can help a child to excrete the lead that has 
been ingested. Until recently, chelation treat- 
ment of children with high levels of blood lead 
was administered by injection and required 
hospitalization. A new chelating agent, suc- 
cimer, enhances urinary excretion of lead and 
has the distinct advantage that it can be given 
orally, rather than by injection. In the 1990s, 
the Treatment of Lead-Exposed Children (TLC) 
Trial Group conducted a placebo-controlled, 
randomized trial of succimer in children with 
confirmed blood lead levels of 20-44 wg/dL; 
levels well above the CDC’s threshold for 
concern about the adverse health effects of 
exposure to lead (Treatment of Lead-Exposed 
Children (TLC) Trial Group, 2000; Rogan et al., 
2001). The children were aged 12—33 months at 
enrollment and lived in deteriorating inner-city 
housing. The mean age of the children at ran- 
domization was 2 years and their mean blood 
lead level was 26 wg/dL. Children received up 
to three 26-day courses of succimer or placebo 
and were followed for 3 years. We will focus 
on longitudinal data on blood lead levels mea- 
sured at baseline, week 1, week 4, and week 
6 on a subset of 100 children from this study 
who were randomized to placebo (control) or 
succimer (active treatment). 


1.2 MIT Growth and Development Study 


The second illustrative example is from a 
prospective longitudinal study on body fat 


accretion in a cohort of 162 girls from the MIT 
Growth and Development Study (Bandini et al., 
2002; Phillips et al., 2003). At the start of the 
study, all of the girls were premenarcheal and 
nonobese, as determined by a triceps skinfold 
thickness less than the 85th percentile. All girls 
were followed over time according to a sched- 
ule of annual measurements until four years 
after menarche. The final measurement was 
scheduled on the fourth anniversary of their 
reported date of menarche. At each examina- 
tion, a measure of body fatness was obtained 
based on bioelectric impedance analysis. One 
of the goals of the study was to examine changes 
in body fat accretion before and after menarche. 
For the purposes of analyses, the “time” 
of measurement is calibrated as “time since 
menarche”; therefore it can be positive (for mea- 
surements after the reported date of menarche) 
or negative (for measurements prior to menar- 
che). Thus, although the measurement protocol 
is the same for all girls if the timing of mea- 
surement is defined as the time since the base- 
line measurement, it is highly irregular when 
the timing of measurements is defined as the 
time since a girl experienced menarche. Rel- 
ative to menarche, each girl is measured at a 
unique set of occasions, with few observation 
times coinciding. In this data set there are a total 
of 1049 individual percent body fat measure- 
ments, with an average of 3.1 measurements 
during the premenarcheal period and 3.5 mea- 
surements during the postmenarcheal period. 


2 Graphical exploration 
of longitudinal data 


The formal statistical analysis of longitudinal 
data should always be preceded by simple 
graphical displays of the data. A natural way to 
display longitudinal data is through the use of a 
standard scatter-plot, with the responses on the 
vertical axis and the measurement times on the 
horizontal axis. We refer to such a plot as a time 
plot. Unfortunately, a time plot of longitudinal 
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data may not always be very helpful or read- 
ily interpretable. In many longitudinal studies 
there are a fixed set of common measurements 
occasions for all study participants; we refer to 
this as a balanced design. For example, the TLC 
trial discussed earlier has a balanced design, 
with all children measured at the same set of 
occasions: baseline (or week 0), week 1, week 4, 
and week 6. In a balanced design, a time plot of 
the raw data results in many overlapping data 
points at each measurement occasion. This can 
make it difficult to determine any trends in the 
mean response over time. In addition, such a 
plot does not indicate which data points repre- 
sent repeated measurements on the same indi- 
vidual. To circumvent the latter problem, the 
time plot can be supplemented by connecting 
successive repeated measures on the same indi- 
vidual with straight lines. However, the result- 
ing line segments do not necessarily enhance 
the time plot; indeed, more often than not, it 
can result in a “spaghetti” plot that is not very 
informative about overall trends in the response 
over time. 

Some of the aforementioned problems with 
the time plot of longitudinal data can be illus- 
trated using data from the Treatment of Lead- 
Exposed Children Trial. Figure 13.1 displays 
a time plot of the blood lead level data for 
the group of children randomized to succimer. 
Because the data points overlap at the common 
set of four measurement occasions, it is diffi- 
cult to discern any pattern in the mean response 
trend over time. Perhaps the only useful source 
of information provided by this simple time 
plot of the raw data concerns the presence of 
outliers in the data and whether the variability 
in the data changes discernibly with time. For 
example, there appears to be an outlying obser- 
vation at week 6, corresponding to a blood lead 
level of 64 wg/dL. 

In Figure 13.2 the time plot of blood lead lev- 
els is supplemented with line segments joining 
successive measures on the same individual. 
Figure 13.2 is only marginally more informative 
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Figure 13.1 Time plot of blood lead levels at 
baseline (week 0), week 1, week 4, and week 6 for 
children from the succimer group in the TLC trial 


about trends in the mean response over time 
than Figure 13.1. The appeal of joining line seg- 
ments is that it allows us to distinguish which 
data points represent repeated measurements 
on the same individual. However, it can be 
very difficult to track the response profile of 
any particular individual when the plot con- 
tains longitudinal data on many individuals. 
With too much “spaghetti”, the information 
conveyed by this plot is difficult to digest. As 
a result, it may be more useful to present the 
time plot with joined line segments for only 
a relatively small random sample of the study 
participants. 

Because of the aforementioned problems with 
time plots of the raw data, it is usually more 
informative to display a time plot of the 
mean response, with successive means joined 
by straight lines. In addition, time plots of 
the mean response for different levels of dis- 
crete covariates (e.g., different intervention or 
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Figure 13.2 Time plot, with joined line segments, 
of blood lead levels at baseline (week 0), week 1, 
week 4, and week 6 for children from the succimer 
group in the TLC trial 


treatment groups) can be overlayed on the 
same graph. The construction of such a plot 
is relatively straightforward when the tim- 
ing of the repeated measures is the same for 
all individuals. The time plots can also be 
enhanced by including standard error bars for 
the mean response at each occasion. For exam- 
ple, Figure 13.3 displays the mean blood lead 
levels in the succimer and placebo groups at 
weeks 0, 1, 4, and 6. From this simple dis- 
play it is readily apparent that the effect of 
succimer is greater after one week of treatment 
and that there appears to be a rebound effect 
thereafter. Overall, a graphical display of the 
mean response can be quite enlightening and 
can provide the basis for choosing an appro- 
priate model for the analysis of change over 
time. For example, the time plot of the mean 
response in Figure 13.3 suggests that the analy- 
sis of the blood lead levels at all four occasions 
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Figure 13.3 Time plot of the mean blood lead 
levels at baseline (week 0), week 1, week 4, and 
week 6 in the succimer and placebo groups 


may require nonlinear (e.g., quadratic) or per- 
haps piecewise linear trends over time. 

The construction of time plots of the mean 
response is less straightforward when a covari- 
ate of interest is quantitative (e.g., dose of drug). 
For the purposes of producing a graphical dis- 
play of the mean response trend, one simple, 
but often quite effective, approach is to con- 
struct a small number of groupings or “refer- 
ence categories” for the quantitative covariate 
in question. For ease of exposition, we consider 
three groupings of the quantitative covariate 
that can be denoted as “low”, “medium”, and 
“high”. Given this set of reference categories, 
the construction of the time plot of the mean 
response trend can proceed along exactly the 
same lines as for the case of a truly discrete 
covariate having only three levels. That is, we 
can simply plot the mean response trends over- 
layed for the different values of the reference 
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categories. However, it must be acknowledged 
that the number and choices of reference groups 
are, to some extent, arbitrary. 

So far, our discussion has assumed that many, 
if not all, individuals are measured at the same 
set of occasions. When the times of measure- 
ment are not the same, construction of time 
plots of the mean response can pose difficul- 
ties due to sparseness of data at any particular 
occasion. For example, Figure 13.4 displays a 
time plot (time relative to age of menarche), 
with joined line segments, of longitudinal data 
on percent body fat in the cohort of 162 girls 
from the MIT Growth and Development Study 
(Bandini et al., 2002; Phillips et al., 2003). Here, 
because each girl is measured at a unique set of 
occasions, with few observation times coincid- 
ing, construction of a plot of the mean response 
over time is difficult due to sparseness of data at 
any particular time. For example, it is difficult 
to precisely estimate the mean percent body fat 
2 years after menarche because there are so few 
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Figure 13.4 Time plot of percent body fat against 
time, relative to age of menarche (in years) 


observations at that particular time. Moreover, 
it is difficult to discern whether the changes in 
percent body fat in the premenarcheal period 
are similar to the changes in the postmenarcheal 
period from this “spaghetti” plot. 

In cases where the design is highly unbal- 
anced (i.e., repeated measurements are not 
obtained at a common set of occasions), it is 
helpful to produce a “smoothed” plot of the 
mean response trend over time. A smooth plot 
of the trend can be obtained using a vari- 
ety of “smoothing techniques”. Many of these 
smoothing techniques approach the estimation 
of the mean response at any distinct time by 
considering not only the observations at that 
occasion but also “neighboring” observations. 
That is, the estimated mean is based on obser- 
vations taken before, at, and after the time of 
interest. Typically, the mean response at any 
time, say tf, is taken to be a weighted average 
of the observations in some close proximity or 
neighborhood of time t. 

One popular smoothing technique is locally 
weighted regression or Jowess (Cleveland, 
1979). The lowess estimate of the mean 
response at time f is determined by fitting a 
straight line to the observations that fall within 
a “window” centered at time t. The fitted regres- 
sion line is obtained using a robust regression 
technique that gives more weight to observa- 
tions close to the center of the window and 
that also down-weights potential outliers. The 
entire lowess curve is obtained by moving a 
window of fixed width from the first mea- 
surement occasion to the last, and repeating 
the process at every time. Figure 13.5 displays 
a lowess curve for the percent body fat data 
described earlier. Unlike the time plot of the 
raw data in Figure 13.4, the lowess curve is 
informative about changes in percent body fat 
before and after menarche. The smooth curve 
produced by the lowess procedure reveals that 
the mean response increases gently during the 
premenarcheal period and then rises steeply 
during the postmenarcheal period. 
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Figure 13.5 Time plot of percent body fat against 
time, relative to age of menarche (in years), with 
lowess smoothed curve 


In summary, graphical techniques have an 
important role in the early stages of longitudi- 
nal data analysis. Although time plots of the 
raw data, with or without joined line segments, 
can be difficult to interpret, especially when 
the number of observations is relatively large, 
time plots of the mean response can be very 
informative. Time plots of the mean response 
are easy to construct when the study design is 
balanced over time; for highly unbalanced data, 
various smoothing techniques can be used. 
A time plot of the mean response can provide 
useful insights about the general patterns of 
change over time and possible functional forms 
for the covariates. For example, Figure 13.3 sug- 
gests that the analysis of the blood lead levels at 
all four occasions may require nonlinear trends 
over time, especially for the succimer group. 
Similarly, Figure 13.5 suggests that the growth 
rate for percent body fat prior to menarche is 
relatively flat and might be well-approximated 
by a linear trend; however, after menarche, the 


growth rate increases steeply. Thus, any model 
for change in percent body fat over time will 
need to incorporate different trajectories in the 
pre- and postmenarcheal periods. 


3 Graphical model-checking based 
on residuals 


Next, we consider methods for assessing the 
adequacy of models for longitudinal data. Typ- 
ically, the analysis of longitudinal data focuses 
on changes in the mean response over time, 
and on the relation of these changes to covari- 
ates. For that reason, we concentrate on resid- 
ual diagnostics for assessing the adequacy of 
the model for the mean response. Methods of 
assessing the model for the covariance are men- 
tioned only briefly at the end of this chapter; 
readers interested in the latter topic are directed 
to Chapter 9 of Fitzmaurice, Laird and Ware 
(2004). Also, for ease of exposition, we focus 
on the assessment of models for longitudinal 
data where the response variable is continu- 
ous; similar techniques can be applied when 
the response is binary, ordinal or count data. 

Methods for residual analyses are well devel- 
oped for standard regression settings with inde- 
pendent observations on a univariate response; 
see Cook and Weisberg (1982) for a compre- 
hensive description of techniques for resid- 
ual analysis. In principle, many of the same 
properties of residual analysis can be extended 
to the longitudinal setting, with relatively 
minor modifications. In this section we also 
consider some recently developed techniques, 
based on aggregating residuals, that put resid- 
ual diagnostics on a somewhat more objective 
footing. 


3.1 Raw residuals 


Before we begin our discussion of residuals we 
must introduce some notation. We assume that 
N subjects are measured repeatedly over time. 
We let Y;; denote the response variable for the 
i” subject on the j“" measurement occasion. 
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In principle, the response variable could be con- 
tinuous, binary, or a count; however, for ease 
of exposition, we focus on the case where Yj; is 
continuous. To accommodate unbalanced data, 
we assume that there are n; repeated measure- 
ments of the response on the i!" subject and that 
each Yj; is observed at time t;. The response 
variables for the i‘" subject can be grouped into 
an n; x 1 vector 


where the vectors of responses, Y,, are assumed 
to be independent of one another (but the 
repeated measures on the same subject are 
emphatically not assumed to be independent). 
Associated with each response, Y;,, there is a 
p x1 vector of covariates 


Xi 


The vector of covariates at the n; occasions can 
be grouped into a n,; x p matrix denoted by Xj. 
We assume the following linear model for the 


vector of continuous responses, Y,, 


Y,=Xp+e, (1) 


where the unknown regression parameters can 
be grouped together into a px1 vector, B = 
(B,,B2,.-.,Bp)’ and e; = (€1,€2,-++,€in,)’ is an 
n; X 1 vector of random errors. The random 
errors, @;, have mean zero and represent devia- 
tions of the responses from their corresponding 
predicted means 


E(Y;|X;) 
= XB = Bi Xi1 + B2Xyot+++-+ BX (2) 


Typically, although not always, X;,, =1 for all 
i and j, and then £, is the intercept term in the 
model. 

Thus far we have made no distributional 
assumptions about Y,. The only assumption 
made is that the mean of the longitudinal 
response vector is related to the covariates via 
the linear regression model given above. When 
Y; is a vector of continuous response, it is com- 
monly assumed that it has a multivariate nor- 
mal distribution, with mean response vector 


E(Y;) = ph; = XiB 
and covariance matrix, 
3, =Cov(¥,) 


Recall that the multivariate normal distribution 
is completely specified by the vector of means, 
w;, and the covariance matrix, >;. The covari- 
ance can be modelled directly or via the intro- 
duction of random effects (e.g., linear mixed 
effects models). This completes our specifica- 
tion of the model for Y;. 

Given a regression model for the mean 
response, specified by equation (1), we can 
define a vector of residuals for each individual, 


r= Y;-XB (3) 


The vector of residuals has mean zero and pro- 
vides an estimate of the vector of errors, 


e=Yi- XB 


These residuals can be used to check for 
any systematic departures from the regression 
model for the mean response. For example, a 
scatter-plot of the residuals 


n~ 


lj = Yy—Xj8 


against the predicted mean response 


~ 


Bij = X;P 
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can be examined for the appearance of any sys- 
tematic trend. The fitting of a smooth curve 
(e.g., a lowess curve) to the scatter-plot can often 
help in judging whether curvature is present. 
In a correctly specified model for the mean 
response, the plot should display no system- 
atic pattern, with a more or less random scat- 
ter around a constant mean of zero. Similarly, 
scatter-plots of the residuals against selected 
covariates from the model for the mean can be 
examined for any systematic trends. Suchatrend 
may indicate the omission of a quadratic term 
or the need for transformation of the covariate. 

For most practical purposes, graphical dis- 
plays of the residuals can be used to detect dis- 
crepancies in the model for the mean response 
or the presence of outlying observations that 
require further investigation. However, there are 
two properties of the residuals from an analy- 
sis of longitudinal data that set them apart from 
residuals in a standard regression with inde- 
pendent observations on a univariate response. 
First, the components of the vector of residuals, 


i= Y,-X,B 


are correlated and do not necessarily have 
constant variance. Because the residuals have 
approximate covariance matrix, Cov(r;) ~ 
Cov(e;) = &;, this has important implications 
for the examination of plots of the residuals. 
First, standard residual diagnostics for exam- 
ining either the homogeneity of the residual 
variance or autocorrelation among the residuals 
should be avoided altogether. Second, although 
residuals from a univariate linear regression are 
uncorrelated with the covariates, the residu- 
als from a regression analysis of longitudinal 
data may be correlated with the covariates. As 
a result, there may be an apparent systematic 
trend in the scatter-plot of the residuals against 
a selected covariate. 


3.2. Transformed residuals 


To circumvent some of the aforementioned 
problems, we can transform the residuals so 


that they have constant variance and zero cor- 
relation, thereby mimicking residuals from a 
standard linear regression. This can be achieved 
using a well-known technique called the 
Cholesky decomposition (or Cholesky factor- 
ization). Given an estimate of the approximate 
covariance matrix for the residuals, >,;, the 
Cholesky decomposition of =; can be used to 
create a lower triangular matrix, L;, such that 


~ 


> =L,L; 


Note that a lower triangular matrix is simply 
one with all zeros above the diagonal. We can 
then use the matrix L, or, more specifically, L;’, 
to take us from a set of correlated residuals with 
heterogeneous variances to a set of transformed 
residuals, 


tf = Ly 11; = 1; \(¥;— XB) (4) 


that are uncorrelated and have unit variance. 

Given the set of transformed residuals, r;, all 
of the usual residual diagnostics for standard 
linear regression can be applied. For example, 
we can construct a scatter-plot of the trans- 
formed residuals, Tis versus the transformed 
predicted values, Kis, where 


fi; = Ly; = L;* X,8 


In a correctly specified model, this plot 
should display no systematic pattern, with 
a random scatter around a constant mean of 
zero and with a constant range for varying 
Hi. Similarly, we can construct a scatter-plot 
of the transformed residuals versus selected 
transformed covariates. With longitudinal 
data, a scatter-plot of the transformed resid- 
uals versus transformed time (or age) can be 
particularly useful for assessing the adequacy 
of the model assumptions about patterns of 
change in the mean response over time. We 
note that standard linear regression programs 
can be used to automate the production of 
residual diagnostics. That is, standard residual 
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diagnostics can be applied after refitting a 
standard linear regression of Y* on X;, where 
Y; =L;>" Y; and X* = L>' X;. For a more detailed 
discussion of the generalization of residual 
diagnostics to longitudinal data, the interested 
reader is referred to articles by Waternaux et al. 
(1989) and Waternaux and Ware (1991). 

Finally, the transformed residuals also make 
it somewhat easier to identify outliers, both 
outlying observations and outlying individuals. 
Outlying observations may be indicated by large 
residuals (e.g., a residual with absolute value 
greater than 2 or 3). Because we are focusing 
on the most extreme values of the residuals, 
the distribution of these extremes is some- 
what more complicated than a standard nor- 
mal distribution. In general, we recommend 
careful examination of the most extreme resid- 
uals while recognizing that extreme residuals 
will occur with predictable regularity; for exam- 
ple, with 1000 residuals, the expected num- 
ber of residuals whose absolute value exceeds 
2 is approximately 1000 x 0.05 = 50. Alterna- 
tively, an outlying individual can be identi- 
fied by first calculating a summary measure of 
multivariate distance between their observed 
and fitted responses, based on the Mahalanobis 
distance, 


Cer rn (5) 
If the model is correctly specified, the dis- 
tance given by equation (5) has an approxi- 
mate chi-squared distribution with degrees of 
freedom (df) equal to the dimension of r* (i.e, 
df = n,, the number of repeated measurements 
on the i’ subject). Outlying individuals will 
have distances, d,, that have small associated 
p-values. The p-values provide a common met- 
ric for comparing and detecting large values of 
d,;, corresponding to unusual or outlying indi- 
viduals, when the number of repeated measure- 
ments varies across subjects. Once again, we 
caution that the distribution of the extremes is 


somewhat more complicated and it is impor- 
tant to recognize that extremes will occur with 
predictable regularity. 

To illustrate the use of raw and transformed 
residuals, we will consider assessing the ade- 
quacy of a longitudinal model for the body 
fat accretion data from the MIT Growth and 
Development Study. Recall that the data are 
from a prospective longitudinal study examin- 
ing changes in body fat before and after menar- 
che in a cohort of 162 girls. For the analysis of 
these data, “time” was coded as time since age 
of menarche and could be positive or negative. 
We consider the hypothesis that percent body 
fat increases linearly with age, but with differ- 
ent slopes before and after menarche. Specifi- 
cally, we assume that each girl has a piecewise 
linear spline growth curve with a knot at the 
time of menarche and fit the following linear 
mixed effects model, 


E(Y;;|b1;,b2;, b3;) = By + Bo ty + Bs (ty) + 
+ by; + by; ty + bs; (ty) 4 


where t;; denotes the time of the j‘ 


if measure- 
ment on the i" subject before or after menar- 
che (i.e., t; = 0 at menarche), (t;), = t; if 
t; > 0 and (t;), =0 if t; < 0. The random 
effects, b,;,b,;,b,;, are assumed to have a mul- 
tivariate normal distribution, with zero mean. 
(An excellent review of linear mixed effects 
models with piecewise linear trends can be 
found in Naumova et al. (2001)). In this model, 
each girl’s growth curve can be described 
with an intercept and two slopes, one slope 
for the changes in response before menarche, 
another slope for the changes in response after 
menarche. 

The restricted maximum likelihood (REML) 
estimates of the fixed effects are displayed 
in Table 13.1. Based on the magnitude of 
the estimate of B,, relative to its standard 
error, it can be concluded that there is a sig- 
nificant difference between the slopes before 
and after menarche. In particular, the esti- 
mated premenarcheal slope is rather shallow 
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Table 13.1 Estimated regression coefficients (fixed 
effects) and standard errors for the piecewise 
linear model for the percent body fat data 


Variable Estimate SE 1b; 

Intercept 21.3614 0.5646 37.84 
Time 0.4171 0.1572 2.65 
(Time), 2.0471 0.2280 8.98 


(0.42) and indicates that the annual rate of 
body fat accretion is less that 0.5%. In con- 
trast, the estimated postmenarcheal slope is 
2.46 (2.0474+0.417) and indicates that the 
annual rate of body fat accretion is approx- 
imately 2.5%, almost six times higher than 
the corresponding rate in the premenarcheal 
period. 

Next we use residual diagnostics to assess 
the adequacy of the fitted model. Based on 
the Cholesky decomposition of the estimated 


is} 
~S 


~2 | 


Quantiles of transformed residuals 


—4 4 


-~2 0 2 
Quantiles of standard normal 


~ 


covariance matrix, >;, we can calculate trans- 
formed residuals, 


rt = L717, = 1; 1(Y;-X6) 


where ] = L,L}. For illustrative purposes, we 
also examine the untransformed residuals and 
compare the diagnostic plots based on these two 
types of residuals. 

Normal quantile plots of the transformed 
and untransformed residuals are presented in 
Figure 13.6 and do not indicate any systematic 
departures from a straight line. There is no evi- 
dence to suggest any discernible skewness and 
the normal assumption appears to be tenable. 
The quantile plot of the transformed residu- 
als does reveal one very extreme observation. 
However, the number of extreme residuals high- 
lighted by Figure 13.6 is not more than what 
we would expect due to chance, given a total of 
1049 observations. 

Next we consider scatter-plots of the trans- 
formed and untransformed residuals versus 
the transformed and untransformed predicted 
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20 5 


Quantiles of residuals 
lo} 


-2 0 2 
Quantiles of standard normal 


Figure 13.6 Normal quantile plot of (a) the transformed residuals, and (b) the untransformed residuals, for 


the percent body fat data 
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Figure 13.7 Scatter-plot of (a) the transformed residuals versus transformed predicted values, and (b) the 
untransformed residuals versus predicted values, for the percent body fat data 


values respectively. The scatter-plots of the 
residuals in Figure 13.7 display no obvious sys- 
tematic pattern, with a random scatter around a 
constant mean of zero. However, when lowess 
smoothed curves are superimposed on the 
scatter-plots, they do reveal some apparent cur- 
vature. Focusing on the transformed residuals, 
there appears to be a quadratic trend, although 
the fall in the lowess curve at the largest val- 
ues of the transformed predicted values should 
be cautiously interpreted as the fitted curve is 
based on few observations at the extremities 
and is therefore likely to be unreliable in that 
region. 

Because of the suggestion of curvature in 
Figure 13.7, we next examine scatter-plots 
of the (transformed) residuals versus (trans- 
formed) time (see Figure 13.8). These scatter- 
plots of the transformed and untransformed 
residuals suggest curvature at (untransformed) 
times corresponding to approximately 2 to 
4 years post-menarche. The pattern is more 
apparent in the scatter-plot of the transformed 
residuals and can no longer be discounted 
due to sparseness of the observations at the 


extremities; here the plots of the transformed 
and untransformed data give somewhat differ- 
ent impressions. The curvature in the scatter- 
plots suggests that the model for the mean 
response might be improved by the inclusion 
of a quadratic trend in the postmenarcheal 
period. 

Next, we illustrate how the transformed 
residuals can be used to identify unusual 
individuals. We can calculate the Mahalanobis 
distance, 

cr ae el 
for each girl and then compare the values to ref- 
erence chi-squared distributions with degrees 
of freedom (df) equal to the dimension of r* (i.e, 
df = n,, the number of repeated measurements 
obtained on each girl). For each girl, we calcu- 
lated d; and its associated p-value. There were 
7 girls whose d; yielded p-values less than 0.05 
and 2 girls with p-values less than 0.01. Given 
that the sample is comprised of 162 girls, dis- 
tances of these magnitudes are to be expected 
by chance alone. 
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Figure 13.8 Scatter-plot of (a) the transformed residuals versus transformed time, and (b) the untransformed 


residuals versus time, for the percent body fat data 


3.3 Aggregating residuals 


So far, much of the discussion of residual diag- 
nostics has focused on graphical techniques for 
assessing the adequacy ofthe model for the mean 
response. With appropriate transformations, we 
have seen that residual diagnostics developed 
for standard linear regression can be extended 
to the longitudinal setting. An acknowledged 
difficulty with conventional residual diagnos- 
tics is that they are somewhat subjective in 
nature. What appears to be a random scatter 
to one individual, might be considered evi- 
dence of systematic trend to another. That is, 
it can be very difficult to discern whether an 
apparent trend in a scatter-plot of the residual 
reflects some aspect of model misspecification 
or is simply a reflection of natural variation. 
McCullagh and Nelder (1989, pp. 392-393) aptly 
summarize this problem when they state that 
“the practical problem is that any finite set 
of residuals can be made to yield some kind 
of pattern if we look hard enough, so that 
we have to guard against over-interpretation.” 
Recently, model-checking techniques based 
on “cumulative sums” and “moving sums” of 


residuals have been developed to help discern 
the “signal” from the “noise”. The basic idea is 
to aggregate the residuals over certain coordi- 
nates. The coordinates typically used for these 
sums of residuals are the individual covari- 
ates (e.g., Xj, the k" covariate) and the fitted 


values, Xi B. The advantage of working with 
sums of residuals, rather than raw or trans- 
formed residuals, is that a reference distribution 
is available to ascertain their natural variation. 
That is, we can compare the observed sums of 
the residuals, both graphically and numerically, 
to a reference distribution under the assump- 
tion of a correctly specified model for the mean. 
This allows us to determine whether any appar- 
ent pattern is evidence of a systematic trend or 
simply due to natural variation. This removes a 
large degree of subjectivity from the assessment 
of graphical displays of residuals and places 
residual diagnostics on a more objective footing. 

Recall that the raw residuals are defined as 
the difference between the observed and fitted 
values of the response, 


n~ 


lj = Y, —Xj,8 
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If the model for the mean is correctly speci- 
fied, these residuals are centered at zero. To 
check the functional form of any covariate, 
say X;,, the k"" covariate, we can define the 
cumulative sum of the residuals over values 
of Xix, 


N 0; 
W(x) = VN OO UXin < x0; 


i=1 j=l 


where J(-) is the indicator function. For any 
given x, W,(x) is the sum of residuals for all val- 
ues of Xj, less than or equal to x. The process 
W,(x) is a step function with possible jumps 
(either increases or decreases) at all of the dis- 
tinct values for X;,. The cumulative sum of 
residuals can be defined similarly with respect 
to any other covariate. In addition, we can con- 
struct the cumulative sum of residuals over the 
fitted values, denoted by W,(x), 


N nj oe 
W(x) =VN UX B < x1 


i=1j=1 


The cumulative sums, W,(x), can be used to 
assess the functional form of the covariates. For 
example, we can construct a plot of W;,(x) ver- 
sus x, where for any value of x on the hori- 
zonal axis, the corresponding value of W,(x) 
on the vertical axis is the cumulative sum of 
the residuals for covariate values of X;, less 
than or equal to x. Evidence of systematic trend 
in this plot suggests that the functional form 
of the covariate (e.g., linearity) is not correctly 
specified and may indicate that a transforma- 
tion of the covariate or the inclusion of polyno- 
mials is required. The cumulative sum, W,(x), 
is useful for assessing the assumption of lin- 
earity (or, more generally, the link function). 
Any evidence of systematic trend in this plot 
might suggest that either a transformation of 
Y (i.e., a transformation of the response) or 
of E(Y|X) (i.e., an alternative link function) is 
necessary. 


Ifthe assumed model for the mean response is 
correct, then the cumulative sums of residuals 
are centered at zero. Moreover, we can ascertain 
the natural variation of the cumulative sum. In 
particular, the distribution of the cumulative 
sum can be approximated by that of a Gaus- 
sian (or normal) process with zero mean whose 
realizations can be generated by computer sim- 
ulation. That is, it is relatively straightforward 
to generate realizations from the distribution of 
the cumulative sum, under the assumption that 
the model for the mean is correct; the techni- 
cal details are omitted here and the interested 
reader is referred to Lin, Wei and Yang (2002). 
Thus, in practical terms, the null distribution 
of W,(x) (or W;(x)) is approximated through 
computer simulation of the zero-mean Gaussian 
process, denoted by W,(x) (and W,(x) respec- 
tively). Then, to assess whether any apparent 
trend in the observed cumulative sum of residu- 
als reflects systematic trend rather than chance 
fluctuations, we can superimpose a number of 
realizations from the appropriate Gaussian pro- 
cess. To the extent that the curves generated 
from the null distribution tend to be closer to 
and intersect zero more often than the observed 
curve, this provides evidence of lack of fit. This 
assessment can be put on a more formal footing 
by comparing the maximum absolute value of 
the observed cumulative sum to a large num- 
ber of realizations (say 10,000) from the null 
distribution. By comparing max|W(x)|, the max- 
imum absolute value of the observed cumu- 
lative sum, to max|W(x)| for each realization 
from the null distribution, a p-value can be con- 
structed based on the proportion of times that 
max|W(x)| > max|W(x)|; the latter is referred to 
as a “supremum” test and provides an omnibus 
test of model adequacy with respect to the rel- 
evant coordinate (e.g., a particular covariate or 
the fitted values). If the p-value is very small 
(say less than 0.05 or 0.01), then the model fit 
can be improved. 

There is an alternative way to aggregate the 
residuals by using a “moving sum” rather than 
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a “cumulative sum”. We can define a mov- 
ing sum of residuals, with “window” b, as 
follows, 


N nj 
W,(x,b) =VN Ox -b < Xj < X)Ty 


i=1j=1 


This represents the sum of residuals in blocks 
of window size, b. Similarly, to assess linearity 
(or the link function), we can define a mov- 
ing sum of residuals with respect to the fitted 
values, 


N 1 
W,(x, b) = VN Mx-b< Xi <= x)ry 


i=1j=1 


A potential advantage of using a moving sum 
of residuals is that the process is less influ- 
enced by the residuals associated with small 
covariate values. One disadvantage of moving 
sums, however, is that they require a somewhat 
arbitrary choice of window size, b. Simulation 
results suggest that the optimal choice of b is 
approximately the range of the lower half of the 
covariate values. 

To illustrate the use of sums of residuals, 
we will consider assessing the adequacy of 
the longitudinal model for percent body fat 
introduced earlier. Figure 13.9 shows a plot 
of the observed cumulative sum of the resid- 
uals (solid curve), with respect to the covari- 
ate time (relative to age of menarche). On the 
vertical axis is the cumulative sum of residu- 
als; the horizonal axis denotes time (in years). 
Superimposed on the graph are two realiza- 
tions (dotted curves) from the null distribu- 
tion under the assumption that the model for 
the mean response is correctly specified. These 
two realizations are computer simulated from 
the appropriate Gaussian mean-zero process. 
By comparing the observed cumulative sum to 
many different realizations under the null, it 
is possible to determine whether any apparent 
trend is systematic or due to chance fluctua- 
tions. From Figure 13.9, the simulated realiza- 
tions produce curves that appear to be closer to 
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Figure 13.9 Plot of observed cumulative sum of 
residuals versus time (relative to age of menarche) 
and 2 simulated realizations from the null 
distribution assuming a correctly specified model 
for mean percent body fat. Note: Supremum p-value 
is based on 10,000 simulated realizations from the 
null distribution 


and intersect zero more often that the observed 
curve. By generating many more such real- 
izations from the null distribution, it is pos- 
sible to get both a graphical and numerical 
indication of whether the curve describing the 
observed cumulative sum displays a systematic 
pattern or simply natural variation. Figure 13.10 
shows a plot of the observed cumulative sum 
of the residuals and 10,000 realizations from 
the null distribution. It would appear that the 
observed cumulative sum displays a systematic 
pattern. In particular, the observed cumulative 
sum is too small in the 12 months after menar- 
che (years 0 to 1) and too large 2 to 4 years 
after menarche. This suggests that the assumed 
functional form for time, in particular after 
menarche, may not be adequate. This graphi- 
cal assessment of fit can be complemented by 
a numerical assessment. The maximum abso- 
lute value of the observed cumulative sum is 
18.28. The so-called supremum test yields a p- 
value of 0.0002, based on the 10,000 simulated 
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Checking functional form for time 


Cumulative residuals 


Pr>MaxAbsVal: 0.0002 


-5 2.5 0 2.5 5 
Time 

Figure 13.10 Plot of observed cumulative sum of 
residuals versus time (relative to age of menarche) 
and 10,000 simulated realizations from the null 
distribution assuming a correctly specified model 
for mean percent body fat 


realizations of the process under the null. That 
is, out of 10,000 simulated realizations, only 2 
had a maximum absolute value that exceeded 
18.28. Thus, both the graphical and numerical 
results suggest that the functional form for time, 
in particular after menarche (time = 0), may be 
inappropriate. 

A similar plot can be constructed based on 
a moving sum rather than a cumulative sum. 
Figure 13.11 shows a plot of the observed mov- 
ing sum of the residuals, with block size equal 
to half the range of time (approximately 5.5 
years). The observed curve in Figure 13.11 also 
suggests that the moving sum of the residuals 
is too small in years 0-1 and too large in later 
years. In a similar fashion, we can complement 
this graphical display with a numerical assess- 
ment. The supremum test yields a p-value equal 
to 0.0004 (based on 10,000 simulated realiza- 
tions), suggesting that the functional form for 
time may be inappropriate. 

Next, we consider a refinement to the model 
for percent body fat to allow for a quadratic 
trend in the postmenarcheal period. In partic- 
ular, we assume that each girl has a piecewise 


Checking functional form for time 
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Figure 13.11 Plot of observed moving sum of 
residuals versus time (relative to age of menarche) 
and 10,000 simulated realizations from the null 
distribution assuming a correctly specified model 
for mean percent body fat 


linear-quadratic growth curve with a knot at the 
time of menarche and fit the following linear 
mixed effects model 


E(Y;|bi;, Dyi, Di, D,;) = Py + Poti + B3(ty)4 
+ Ba(ty) + bij + duit 
+ bsi(ti)4 + bai (ty) 


where (t,)} = ¢; if t; > 0 and (t,)} =0 if t; <0. 
The random effects, b,;,b,;,b3;,b4;, are assumed 
to have a multivariate normal distribution, with 
zero mean. In this model, each girl has a sepa- 
rate growth curve that can be described in terms 
of a linear trend for changes in response before 
menarche, and a quadratic trend for changes in 
response after menarche. 

The REML estimates of the fixed effects, 
B=(B,,B.,B3,B,)', are displayed in Table 13.2. 
These results suggest that there is significant 
nonlinearity in the postmenarcheal trend. The 
estimate of 6, indicates that increases in per- 
cent body fat are greatest around the time of 
menarche but level off at approximately 4 years 
following the onset of menarche. The results 
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Table 13.2 Estimated regression coefficients (fixed 
effects) and standard errors for the piecewise 
linear-quadratic model for the percent body fat data 


Variable Estimate SE Z 

Intercept 20.4201 0.5817 35.10 
Time —0.0155 0.1612 —0.10 
(Time), 4.8439 0.4055 11.94 
(Time)? —0.6469 0.0772 —8.38 


also suggest that there is no significant increase 
in percent body fat during the 3 to 4 years prior 
to menarche. 

For this revised model, we consider scatter- 
plots of the (transformed) residuals versus 
(transformed) time (see Figure 13.12). The 
scatter-plots of the transformed and untrans- 
formed residuals do not reveal any obvious sys- 
tematic trends. When lowess smoothed curves 
are superimposed on the scatter-plots, the cur- 
vature that was apparent in Figure 13.8(a) 
is no longer discernible in Figure 13.12(a). 
The inclusion of a quadratic trend in the 


(a) 


Transformed residual 


~2 | 


05 O 0.5 1 
Transformed time 


postmenarcheal period has led to an improve- 
ment in fit as determined by both the Wald test 
for the quadratic trend (Z = —8.38, p < 0.0001) 
and the examination of residual diagnostics. 
Similarly, we can assess the adequacy of the 
quadratic trend model using cumulative and 
moving sums of residuals. Figure 13.13 shows 
a plot of the observed cumulative sum of the 
residuals, with respect to the covariate time; 
superimposed on the graph are 10,000 realiza- 
tions from the Gaussian mean-zero null distri- 
bution. This plot suggests there is no systematic 
trend in the observed curve. This is confirmed 
by anumerical assessment. The maximum abso- 
lute value of the observed cumulative sum is 
8.46, with corresponding p-value for the supre- 
mum test equal to 0.174. A similar plot can 
be constructed based on a moving sum and 
yields the same conclusion. Thus, both the 
graphical and numerical results suggest that the 
functional form for time (i.e., piecewise linear- 
quadratic) is adequate for these data. An overall 
assessment of linearity (or the link function), 
assessing the need for a transformation in Y 
or the mean of Y, can be based on a plot of 


(b) 
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Figure 13.12 Scatter-plot of (a) the transformed residuals versus transformed time, and (b) the untransformed 
residuals versus time, for the revised model for the percent body fat data 
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Figure 13.13 Plot of observed cumulative sum of 
residuals versus time (since menarche) and 10,000 
simulated realizations from the null distribution 
assuming a correctly specified model for mean 
percent body fat 
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Figure 13.14 Plot of observed cumulative sum of 
residuals versus the fitted values and 10,000 
simulated realizations from the null distribution 
assuming a correctly specified model for mean 
percent body fat 


the cumulative sum of residuals with respect 
to the fitted values (see Figure 13.14). This plot 
also suggests there is no systematic trend in the 
observed curve; the p-value for the supremum 
test is equal to 0.177. 


Finally, it is worth emphasizing that the 
graphical and numerical methods based on 
cumulative and moving sums of residuals are 
valid regardless of the true joint distribution 
of the longitudinal response vector; in particu- 
lar, they do not require correct specification of 
the covariance among the responses. As such, 
these graphical and numerical techniques for 
assessing the model for the mean response are 
relatively robust to assumptions about the distri- 
bution of the responses and assumptions about 
the covariance among the repeated measures. 


4 Conclusion 


In this chapter we have reviewed graphical 
techniques that are useful at both the early 
and later stages of longitudinal data analysis. 
We have seen that time plots and smoothed 
plots of the mean response over time, often 
stratified by covariates, can be helpful in deter- 
mining trends in the mean response over time 
and the appropriate functional form for covari- 
ates. Graphical techniques, based on residuals, 
are especially useful for assessing the adequacy 
of any postulated model for longitudinal data. 
They are also useful for identifying observations 
and individuals that are potential outliers. 

Of note, the focus of the graphical and numer- 
ical techniques discussed in this chapter has 
been on the model for the mean response. This 
reflects the fact that the primary goal of many 
longitudinal studies is to assess changes in the 
mean response over time and the factors that 
influence change. To a large extent, the covari- 
ance among repeated measures on the same 
individuals is regarded as a nuisance character- 
istic of the data and is of secondary interest. Of 
course, this does not imply that the covariance 
can be disregarded or simply ignored. Indeed, 
the covariance among repeated measures must 
be properly accounted for to assure valid infer- 
ences. Graphical techniques also have a role 
in the assessment of the adequacy of the vari- 
ance and correlation assumptions in longitudi- 
nal data analysis. For example, the adequacy 
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of the variance assumption can be informally 
assessed by examining the scatter-plot of the 
transformed residuals versus the transformed 
predicted values and/or time. In a correctly 
specified model for the variance, the range of 
the transformed residuals should be approxi- 
mately constant over (transformed) time and for 
varying #4;;, A more informative plot is obtained 
by considering the scatter-plot of the absolute 
values of the transformed residuals, Irjl. versus 
Bi and/or (transformed) time. If the assumed 
model for the variance is adequate, there should 
be no systematic trend. An informal check on 
the overall adequacy of the model for the covari- 
ance, both the models for the variances and cor- 
relations, is provided by a smoothed plot of the 
so-called empirical semi-variogram. A detailed 
description of the use of the semi-variogram for 
longitudinal data can be found in the article by 
Laird et al. (1992), Chapter 10 (Section 10.4) 
of Verbeke and Molenberghs (2000), Chapter 
3 (Section 3.4) of Diggleq et al. (2002), and 
Chapter 9 (Section 9.4) of Fitzmaurice, Laird 
and Ware (2004). 


Software 


The transformed residuals discussed in 
Section 3.2 can be produced as standard output 
from some statistical packages. For example, 
they can be obtained using the “normalized” 
residuals option with the Ime function in 
S-PLUS and with the VCIRY option with PROC 
MIXED in SAS. Model checking based on aggre- 
gate residuals can be implemented using the 
ASSESS statement in PROC GENMOD in SAS. 
Because statistical software is constantly evolv- 
ing, all of the techniques discussed in this 
chapter should soon be available within most 
of the major statistical packages. 
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| Chapter 14 J 


Separating age, period, and cohort 
effects in developmental and 
historical research 
Scott Menard 


This chapter deals with a fundamental issue 
in longitudinal research, the separation of 
developmental (age) effects, historical (period) 
effects, and the effect of experiencing certain 
historical events at a certain age (cohort effects). 
Basic to this issue is a discussion of alternative 
dimensions on which we can measure time in 
the analysis of change. Section 1 deals with age 
and period as time dimensions; Section 2 with 
age, period, and cohort as explanatory variables; 
and Section 3 with the conceptual status of 
cohort as a unit of analysis. Section 4 illus- 
trates the dummy variable regression approach 
to analyzing age, period, and cohort effects, sug- 
gesting why its use has declined after some ini- 
tial popularity. Sections 5 and 6 describe the 
conceptual approach to analyzing changes over 
time and age, not only in values of variables 
but also in relationships among variables, and 
Section 7 concludes the chapter. 


1 Age and period as alternative 
dimensions of time 


In longitudinal research, change is typically 
measured with reference to one of two con- 
tinua: chronological time (hereafter simply 


time) or age. Time is measured externally to the 
cases or subjects being studied (e.g., 7:15 p.m., 
October 26, 2006). Age is measured internally, 
relative to the subject or case under study (e.g., 
twenty-five years since birth). The choice of 
time or age as the underlying continuum for 
measuring change may be important, and for 
some purposes it may be useful to consider both 
in the same analysis. Also important is the dis- 
tinction between age-related differences when 
age is measured cross-sectionally (differences 
between subjects who are 40 years old and sub- 
jects who are 50 years old in 1990) and age 
measured longitudinally (differences between 
subjects who are 40 years old in 1990 and those 
same subjects when they are 50 years old in 
2000). When age is measured cross-sectionally, 
the differences between the values of variables 
for 40-year-olds and the values of variables for 
50-year-olds may be interpreted as differences 
between birth cohorts or age groups at a particu- 
lar time. When age is measured longitudinally, 
the differences may be interpreted as develop- 
mental differences within a cohort or age group 
over time. 

The demographic definition of a cohort is 
provided by Glenn (1977, p. 8): “A cohort is 
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defined as those people within a geographically 
or otherwise delineated population who expe- 
rienced the same significant life event within 
a given period of time.” A similar definition 
is offered by Ryder (1965, p. 845): “A cohort 
may be defined as the aggregate of individu- 
als (within some population definition) who 
experienced the same event within a given 
time interval.” Both Glenn and Ryder noted 
that although the term “cohort” is usually used 
to refer to birth cohorts, one may also define 
cohorts in terms of year of marriage or divorce, 
year of first employment or retirement, or year 
of occurrence of other events. 


2 Age, period, and cohort as 
explanatory variables 


Hobcraft et al. (1982), in a thorough discussion 
of age, period, and cohort as explanatory vari- 
ables, noted that age is “a surrogate—probably 
a very good one in most applications—for 
aging or more generally for physiological states, 
amount of exposure to certain social influences, 
or exposure to social norms.” Although it would 
be desirable to replace age by the variables for 
which it is a surrogate, age may generally be 
expected to perform quite well as an explana- 
tory variable. Indeed, it is possible that age, 
which can be measured with some precision, 
may be a more valid measure of such underly- 
ing variables than more direct but potentially 
less reliable measures (e.g., survey measures) of 
exposure to norms or other social influences. 
Diagnostic measures of physiological states may 
be more accurate, but also much more costly, 
than simply asking a respondent his or her 
age. Although imperfect, then, age appears to 
be a reasonable choice as an explanatory vari- 
able. Age may be measured cross-sectionally 
or longitudinally. When it is measured only 
cross-sectionally, age differences are the same 
as cohort differences, and the impact of age 
cannot be separated from the impact of being 


in a particular cohort. To the extent that we 
draw conclusions about developmental differ- 
ences over the life course from purely cross- 
sectional data, we are assuming that there are 
no differences associated with being in dif- 
ferent cohorts. Mathematically, however, being 
a certain age and being in a certain cohort 
are identical, and their effects cannot be 
separated. 

Hobcraft et al. (1982) also assert that “ ‘Period’ 
is a poor proxy for some set of contempora- 
neous influences, and ‘cohort’ is an equally 
poor proxy for influences in the past. Measured 
‘effects’ of periods and cohorts are thus mea- 
sures of our ignorance: in particular, of whether 
the factors about which are ignorant are more 
or less randomly distributed along chronolog- 
ically measurable dimensions.” If we measure 
age for a single cohort across multiple periods, 
being a certain age is mathematically identical 
to being in a certain period, and the impacts 
of developmental change (age) and historical 
change (period) cannot be distinguished. With 
multiple ages, periods, and cohorts, mathemat- 
ically cohort (year of birth) = period (calendar 
year) — age (years since birth), and because the 
three are linearly dependent, we cannot sepa- 
rate one (linear) effect from the other (in the 
case in which each is hypothesized to have a 
linear effect on some outcome). This situation 
of linear dependence posed a critical problem 
for the joint analysis of age, period, and cohort 
effects, because the effect of any one of the vari- 
ables could, mathematically, just as well be an 
effect of the other two. For example, an appar- 
ently linear decline in fertility or crime could 
be interpreted as a period effect, or as a combi- 
nation of age and cohort effects, since period = 
age + cohort. 

In 1973, Mason et al. (1973) developed a 
dummy variable regression method for para- 
meterizing age, period, and cohort effects, in 
an effort to overcome the problem of linear 
dependence among the three variables when 
age is measured as age at last birthday (integer 
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number of years since birth, e.g., 25), period 
is measured as current year (e.g., 1990), and 
cohort, implicitly birth cohort, is measured as 
year of birth (e.g., 1965). Publication of Mason 
et al.’s (1973) proposed solution to this problem 
of linear dependence produced three responses. 
One response was a series of papers that used 
real and hypothetical data to demonstrate the 
limitations (particularly sensitivity to assump- 
tions used in model specification) and poten- 
tially inappropriate uses of the dummy variable 
regression technique (Glenn, 1976; 1977; 1981; 
Greenberg and Larkin, 1985; Rodgers, 1982), 
and defenses of the robustness and usefulness 
of the method (Knoke and Hout, 1976; Mason 
et al., 1976; Smith et al., 1982). A second 
response was to modify the method somewhat 
(e.g., Maxim, 1985; Wright and Maxim, 1987) or 
to develop a method which explicitly attempts 
to avoid the problems with the dummy vari- 
able regression technique by eliminating at least 
one of the three possible influences (Palmore, 
1978). A third response was a proliferation of 
papers using either the dummy variable regres- 
sion technique or the technique developed by 
Palmore to study the effects of age, period, and 
cohort on a variety of topics, including crime 
and delinquency (e.g., Lab, 1988; Maxim, 1985; 
Pullum, 1977; Smith, 1986; Steffensmeier et al., 
1987) suicide (e.g., Wasserman, 1987), alcohol 
and drug use (e.g., O’Malley et al., 1984), fertil- 
ity (e.g., Wright and Maxim, 1987), divorce (e.g., 
Carlson, 1979) and other phenomena. After a 
burst of activity over about a 20-year period, the 
use of the dummy variable regression approach 
to the analysis of age, period, and cohort 
effects has become much less frequent in these 
areas. 


3 Cohort as a unit of analysis 


Both Glenn (1997) and Ryder (1965) in their 
definitions of “cohort” noted that although the 
term “cohort” is usually used to refer to birth 


cohorts, one may also define cohorts in terms of 
year of marriage or divorce, year of first employ- 
ment or retirement, or year of occurrence of 
any number of other events. Graetz (1987) used 
the term event cohorts to describe cohorts other 
than birth cohorts. He dealt specifically with 
cohorts defined in terms of the year of attain- 
ment of highest level of education. To the extent 
that an event is not dependent on age or period, 
an event cohort is not linearly dependent on 
age or period. As an example, consider the 
event cohort of year of maximum educational 
attainment from Graetz (1987). One may ter- 
minate one’s education with two years of high 
school at age 16, or with a Ph.D. at age 35. 
People drop out of school and obtain their doc- 
torates every year, so there is likely to be lit- 
tle relationship between period and the event 
cohort defined by maximum educational attain- 
ment. There is likely to be a nonlinear rela- 
tionship between age and the end of formal 
education, with peaks at age 18 and 22 (high 
school and college graduation), and an increase 
in the absolute number but not the rate (num- 
ber/population) of those who end their educa- 
tion in successive years (because of population 
growth). Note that the event cohort is defined 
in terms of those who terminate their educa- 
tion at any given level, not, for example, at 
college graduation, so increasing levels of edu- 
cation may not be clearly reflected in age or 
period patterns involved in the termination of 
education. 

In the above descriptions and definitions of 
cohorts, the absence of any linear dependence 
on age and period for some (not all) event 
cohorts is a relatively minor point, distinctly 
secondary in importance to the more fundamen- 
tal point that cohorts, as aggregates of individ- 
uals, are units of analysis (units upon which 
measurement is performed) or cases for study. 
It is in this sense that the term cohort is primar- 
ily used by Ryder (1965), and implemented in 
some studies (Lloyd et al., 1987; Wetzel et al., 
1987; see also Joshi, Chapter 5, and Mayer, 
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Chapter 6, in this volume). Even in some studies 
which viewed cohort primarily as an explana- 
tory variable, it was also recognized as a unit 
of analysis (e.g., Wright and Maxim, 1987). 
An analogy may be drawn between cohorts, 
which are defined in terms of time and implic- 
itly limited to certain geographic or political 
boundaries, and aggregations such as census 
tracts, cities, and nations, which are defined 
in terms of geographic and political bound- 
aries and implicitly limited to some period in 
time. An example of the latter would be the 
26 American cities for which data on crimi- 
nal victimization and other variables were col- 
lected in the early 1970s (US Department of 
Justice, 1975a; 1975b; 1976), and upon which 
several studies of victimization have been based 
(e.g., Booth et al., 1977; Decker, 1980; Menard 
and Covey, 1988; O’Brien, 1983). Both cohorts 
and cities represent aggregates of individuals. 
Both may be used as cases (rather than vari- 
ables) in data analysis. Both may have charac- 
teristics, such as size and composition, which 
are aggregate in nature, not reflected in specific 
individual members of those aggregates. It is 
these characteristics that are most appropriately 
treated as variables, rather than cohorts or cities 
themselves. 

Cohorts are aggregates of individuals. Ages 
and periods are aggregates of years or other 
units of time. In social and behavioral science 
research, cohorts have measurable characteris- 
tics, some of which are inherently aggregate in 
nature (size, gender, or ethnic composition) and 
others which are summations (total number of 
arrests) or averages (median lifetime income) of 
the characteristics of the individuals who com- 
prise the cohort. By contrast, we do not measure 
the aggregate characteristics of ages or periods 
as such, but instead we measure aggregate char- 
acteristics of individuals or other units of analy- 
sis during a particular period or age. Age 15 and 
the year 1980 are not differentiated from other 
ages or years by size (unless you count leap 
years) or composition; the cohort born in 1965 


is differentiated from other cohorts by size and 
composition. Cohorts, which are aggregates of 
individuals, may be units of analysis. Age and 
period, which are aggregates of years or other 
units of time, are variables, and they may be 
used to delimit the units of analysis for a par- 
ticular study (e.g., those 15 years old in 1980), 
but they are not themselves units of analysis in 
sociological research. 

In this respect, cohorts are qualitatively dif- 
ferent from ages or periods, neither of which 
can be described as aggregates of individuals 
bounded by space and time. Considered in this 
light, the term “cohort effect” takes a pecu- 
liar meaning, and refers to the “effect” of the 
unit of analysis. By analogy, one could speak of 
“city effects” when cities are the units of anal- 
ysis. Operationally, they may be treated in the 
same way. One may examine differences among 
cohorts or cities on some set of dependent vari- 
ables, e.g., in an analysis of variance, or one may 
use characteristics of the units of analysis such 
as size or composition to try to explain differ- 
ences among the units of analysis. The former 
strategy raises important questions of interpre- 
tation, and should almost always lead to the lat- 
ter strategy. In other words, if we establish that 
a difference exists (between cohorts or cities), 
the next step is to explain why that difference 
exists. 

Implementation of the second strategy is not 
necessarily difficult. According to Ryder (1965), 
“A cohort’s size relative to the sizes of its neigh- 
bors is a persistent and compelling feature of 
its lifetime environment.” Mason et al. (1976) 
noted that age, period, and cohort are proxies 
for unmeasured variables and indicated that “if 
cohort size is the variable which causes differ- 
entiation in the context of a specific substantive 
problem, then, if size measurements can be con- 
structed, it is unnecessary to include cohorts as 
such in the specification because the preferred 
variable is available.” They continued by not- 
ing that use of cohort size potentially eliminates 
the linear dependence problem (unless cohort 
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size increases linearly with time) and hence 
the problem of estimability for linear regres- 
sion, and makes the results of the analysis less 
tentative. Hobcraft et al. (1982) observed that 
age, period, and cohort (measured as year of 
birth) are all proxies for other variables, and 
suggested that for cohort and period in par- 
ticular, “when these factors can themselves be 
directly measured, there is no reason to probe 
for period or cohort effects.” Ryder (1965) noted 
that size is only one of several characteristics 
which may be used to differentiate cohorts from 
one another; however, cohort size is the cohort 
characteristic that has played the most impor- 
tant role in research since the publication in 
1968 of Easterlin’s work regarding the impact 
of cohort size on the labor force, and his sub- 
sequent work, published in 1980, regarding the 
impact of cohort size on a variety of social prob- 
lems, including unemployment, divorce, and 
crime (Easterlin, 1987). Still, cohort size need 
not be emphasized to the point that other poten- 
tially important cohort characteristics become 
neglected or excluded. The point is that cohort, 
measured as year of birth, has sometimes been 
used when cohort size or some other cohort 
characteristic would have been more appropri- 
ate for studying age, period, and cohort effects. 
This may stem at least in part from a fail- 
ure to recognize that age, period, and cohort 
have qualitatively different statuses as explana- 
tory variables. Although, as noted above, cohort 
may technically be treated as an explana- 
tory variable, it is generally not appropriate 
to do so. 

For example, some analyses of the Easterlin 
(1987) relative cohort size hypothesis have 
first proceeded by calculating a dummy vari- 
able regression model (following Mason et al., 
1973), then by examining the zero-order cor- 
relation between the magnitude of the dummy 
variable parameters and cohort size (Maxim, 
1985; Smith, 1986; Steffensmeier et al., 1987). 
In this instance, it would have been more 
appropriate to include cohort size, rather than 


year of birth, in the original predictive equa- 
tions. This latter approach, which was used by 
other researchers (Elliott et al., 1989; Menard, 
1992; Menard and Elliott, 1990; Menard and 
Huizinga, 1989; O’Brien, 1989), has the method- 
ological advantage of eliminating the estima- 
bility problem for which the Mason et al. and 
Palmore techniques were proposed as solu- 
tions, and the conceptual advantage of testing 
the hypothesis directly rather than indirectly. 
In addition, it may alleviate the aforementioned 
problem that the dummy variable regression 
technique may be highly sensitive to assump- 
tions made to identify the model, and may cap- 
italize on chance variation in estimating the 
model parameters (Greenberg and Larkin, 1975; 
Rodgers, 1982). 


4 Illustration of the dummy 
variable regression analysis of age, 
period, and cohort effects 


Table 14.1 illustrates these points with a reanal- 
ysis from Smith’s (1986) study of homicide 
arrests in which he used the dummy variable 
regression technique to calculate three models 
with slightly different assumptions.’ The zeros 
in parentheses correspond to dummy variables 
omitted from each equation in order to iden- 
tify the model. Also in Table 14.1, ordinary 
least-squares (OLS) regression results using the 
same data are presented, but with cohort size 
in place of year of birth as the variable for 
cohort effects. From Table 14.1, four major 


‘Smith also used OLS regression equations that 
included age and cohort size, but not period, as pre- 
dictors of homicide rates for disaggregated annual 
age and period specific rates. The present analysis, 
for purposes of comparison, is limited to the aggre- 
gated (five-year interval) data used in the model that 
included all three types of effects: age, period, and 
cohort. 


Table 14.1 Homicide arrest rates 


Regression coefficients 


Smith (1986) dummy variable regression Continuous variable OLS regression 
Independent Variable Model 1 Model 2 Model 3 Unstandardized Standardized 
Age (all categories) NA NA NA —.278* —.546 
(15-19) —.31 —12.86 —6.28 
(20-24) 8.33 —2.15 3.21 
(25-29) 8.33 —.64 4.10 
(30-34) 6.54 23 3.19 
(35-39) 4.54 40 2.15 
(40-44) 2.23 (0) (0) 
(45-49) (0) (0) (0) 
Period (all categories) NA NA NA .178** .418 
(1952-56) (0) (0) (0) 
(1957-61) (0) —1.98 (0) 
(1962-66) —.07 —4.19 —1.42 
(1967-71) 3.57 —2.62 1.35 
(1972-76) 3.72 4.54 62 
Cohort (all: cohort size) NA NA NA .005* .380 
(1903-07) (0) (0) (0) 
(1908-12) (0) (0) .60 
(1913-17) —.79 1.99 89 
(1918-22) —1.48 3.38 1.08 
(1923-27) —1.73 5.21 1.72 
(1928-32) —1.76 7.28 2.72 
(1933-37) —.25 10.85 4.97 
(1938-42) 1.18 14.36 7.18 
(1943-47) 4.28 19.54 T27 
(1948-52) 6.88 24.22 14.75 
(1953-57) 8.32 27.74 17.06 
Intercept 5.76 7.16 6.10 —3.88 NA 
Explained variance (R’) .99 .99 .99 47 A7 


* Significant at the .05 level 
** Significant at the .01 level 
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points should be made. First, with regard to the 
existence of nonlinear effects, the three models 
give inconsistent results for age and period. 
The peak age for homicide arrests (allowing 
for period and cohort effects) is 20-29 for 
Model 1, 35-39 for Model 2, and 25-29 for 
Model 3. Such instability might be attributed 
to the small number of ages and years, but 
Rodgers (1982) indicated that the dummy vari- 
able regression technique is typically sensi- 
tive to the assumptions made to identify the 
model, and such instability across models with 
different assumptions is to be expected. For 
cohort, Model 1 shows a decline followed by an 
increase in the cohort parameters, but Models 2 
and 3 show monotonic increases in the cohort 
parameters. The cohort parameters are highly 
correlated with cohort size: .88 for Model 1, 
.74 for Model 2, and .81 for Model 3. Cohort 
size thus explains 55—78% of the variance in 
the effects of cohorts, measured as year of birth. 
Briefly, then, the first point is that the dummy 
variable regression technique does not neces- 
sarily provide a reliable guide to identifying or 
describing the form of the relationship (linear 
as opposed to nonlinear effects). The dummy 
variable regression results are not unique for a 
given set of variables. Instead, as Smith (1986) 
demonstrated, the results vary, depending on 
which categories are excluded or set equal to 
one another in order to identify the model. 
The second major point is that the sub- 
stantive conclusions from the dummy variable 
regression and OLS regression approaches may 
differ. Using the dummy variable approach, 
Smith concluded that cohort effects appear to 
be strongest. In the OLS approach (using cohort 
size instead of year of birth), the age effect 
appears to be strongest and the cohort size effect 
appears to be weakest, based on the standard- 
ized regression coefficients. All of the effects in 
the OLS equations are statistically significant at 
the .05 level, and the age and period effects are 


statistically significant at the .01 level.? Use of 
the dummy variable regression technique, then, 
will not necessarily produce the same results 
as use of continuous variables in a regression 
analysis. 

The third major point is that the results of 
the OLS approach with continuous variables are 
more readily interpretable than those obtained 
with the dummy variable regression approach. 
In the dummy variable regression analysis, the 
units of measurement are defined in terms of 
the omitted categories. In the OLS regression 
with continuous variables, the units of mea- 
surement are defined, not in terms of omit- 
ted categories, but in terms of more natural 
units of measurement: years (actually, five-year 
intervals in Smith’s data) for age and period, 
arrests per 100,000 population for the depen- 
dent variable, and thousands of births for cohort 
size. Based on the results using cohort size, we 
can say that the homicide arrest rate declines 
.278 arrests per 100,000 people (or about 3 per 
million) for every five-year increase in age; that 
it increases at a rate of .178 arrests per 100,000 
people (about 2 per million) every five years 
(historically); and that it increases 5 arrests per 
100 million people for every increase of 1000 
births in a cohort. 

A fourth point is that the results using cohort 
size make more intuitive sense than those 
obtained using year of birth. First, the number 


?The use of statistical significance tests here is 
consistent with the recommendation of Winch and 
Campbell (1969), who suggested the use of signifi- 
cance tests even with population data to allow us to 
evaluate the possibility that apparent relationships 
in the data are the result of haphazard variation in 
the data (chance, instability) rather than systematic 
relationships among variables. For a dissenting view 
of this use of significance tests, see Morrison and 
Henkel (1969). This debate lies outside the focus of 
the present chapter, and the results regarding statis- 
tical significance may be ignored without changing 
the conclusions reached here. 
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of parameters is more manageable: three in 
the equation using cohort size, and 23 in 
the equation using year of birth. Second, the 
explained variance appears to be inflated in 
the dummy variable regression equations. This 
may be attributable at least in part to the fact 
that the number of parameters (23) is large rel- 
ative to the number of values of the dependent 
variable (35). The situation becomes worse for 
larger numbers of ages and periods; as the num- 
ber of ages and periods increases, the number 
of parameters to be estimated approaches the 
number of values of the dependent variable. 
Ideally, in regression analysis, one should have 
several times as many values of the dependent 
variable as there are parameters to be estimated. 
Otherwise, the parameter estimates capitalize 
on chance variation, and the explained variance 
is inflated (Johnston, 1984; Kleinbaum, Kupper 
and Muller, 1988). The 99% explained vari- 
ance in the dummy variable regression model 
is not so much an indicator of how well histor- 
ical, developmental, and cohort effects have an 
impact on aggregate homicide arrest rates as it 
is areflection of the purely statistical properties 
of the dummy variable regression “accounting 
system” for age, period, and cohort effects; a 
comparable level of explained variance can be 
expected in such dummy variable regression 
models regardless of the substantive impact of 
the variables in question. The 47% explained 
variance in the OLS equation with cohort size, 
in contrast, is more plausible as an estimate of 
the combined impact of developmental, histori- 
cal, and cohort size influences on the homicide 
arrest rate. 

The point of all this is that cohort, mea- 
sured as year of birth, has sometimes been 
used when cohort size or some other cohort 
characteristic (or a nonlinear interaction term 
involving age and period) would have been 
more conceptually or theoretically appropriate 
for studying age, period, and cohort effects. 
This may stem at least in part from a fail- 
ure to recognize that age, period, and cohort 


have qualitatively different conceptual statuses. 
Although, as noted above, cohort membership 
may be treated as an explanatory variable from a 
purely methodological viewpoint, theoretically 
and substantively it is not generally appropriate 
to do so. Age and period are more appropriate as 
explanatory variables, age more so than period 
(Hobcraft et al., 1982). Ideally, one would elim- 
inate period and cohort and replace them with 
the variables for which they act as proxies in 
any causal analysis. 


5 Period effects: Changes over time 


If we are concerned only with changes over 
chronological time (historical changes) and 
not with changes over age (developmental 
changes), we must either be certain that age is 
entirely irrelevant, or include age as an explana- 
tory variable, or control for age by making age- 
specific comparisons. One of the safest ways 
to approach the study of trends over time is to 
use age-specific comparisons. In an age-specific 
comparison, only those cases of a certain age in 
one year are compared with cases of the same 
age in some subsequent year. The age may rep- 
resent a single year (e.g., age 15) or a range 
of years (e.g., over age 65), and separate com- 
parisons may be made for all possible ages or 
age groups. Some variables are naturally age- 
specific, e.g., Scholastic Aptitude Test (SAT) 
scores (because the SAT is taken primarily by 
individuals aged 16-18) or in infant mortal- 
ity rates. In other instances, it is necessary to 
explicitly control for age. For example, Gold 
and his associates (Gold and Reimer, 1975; 
Williams and Gold, 1972) examined rates of 
self-reported delinquency among national prob- 
ability samples of 13—16-year-olds in a repeated 
cross-sectional design, and found little evi- 
dence of change from 1967 to 1972. Menard 
(1987) obtained similar results for national 
probability samples of 15-17-year-olds from 
1976 to 1980. Covey and Menard (1987; 1988) 
examined trends in victimization and trends in 
arrests for those over age 65, and found that 
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rates of arrests were generally increasing and 
rates of victimization were generally decreasing 
among this older age group. 

In each of the above-cited studies, control- 
ling for age produced relatively unambiguous 
evidence for the existence or absence of trends 
over time. Without such controls for age, it may 
be difficult to ascertain whether changes are 
historical or developmental in nature, even if 
the entire population is used instead of a sam- 
ple. Chilton and Spielberger (1971) examined 
changes in official crime rates, and found that 
much of the apparent change over time (what 
appeared on the surface to be a change in behav- 
ior) was attributable to changes in the age struc- 
ture, or, more specifically, to changes in the 
percentage of the population in the adolescent 
ages. Individual studies will vary, but in gen- 
eral it is appropriate to consider the possibil- 
ity that apparent period trends may actually be 
attributable to changes in age (at the individ- 
ual level) or age composition (at the aggregate 
level). 

Another concern of longitudinal research is 
the examination of changes, not in values or 
levels of variables over time, but in relation- 
ships between or among variables over time. It 
is one thing, for example, to say that mortal- 
ity has been declining for over two centuries. 
It is another to indicate that in the early stages 
of mortality decline, reductions in mortality 
were achieved primarily by public health mea- 
sures (sanitation, access to safe drinking water, 
pasteurization, etc.) and medicine played lit- 
tle if any role, but in the later stages of 
the decline, advances in medicine (inocula- 
tion, antibiotics) rather than public health mea- 
sures were responsible for mortality declines 
(McKeown, 1976; McKeown and Record, 1962; 
McNeill, 1976). Hout et al. (1999) examined 
the relationship between social class (six cat- 
egories, from professional to less skilled blue 
collar) and voting behavior in American presi- 
dential elections from 1944 to 1992, and found 
different patterns for different classes. The 


tendency of the highest (professional) class 
to support Republican candidates increased 
over time, while the tendency of the three 
lower socioeconomic classes to vote Democratic 
declined over time, particularly for the nonpro- 
fessional self-employed and the skilled manual 
classes. 


6 Age effects: life cycle 
and developmental changes 


Baltes and Nesselroade (1979) listed five objec- 
tives or rationales for longitudinal (or more 
specifically, in their case, prospective panel) 
research: (1) direct identification of intraindi- 
vidual change, i.e., whether individuals change 
from one period to another; (2) direct identifi- 
cation of interindividual similarities or differ- 
ences in intraindividual change, i.e., whether 
individuals change in the same or differ- 
ent ways; (3) analysis of interrelationships in 
behavioral change, i.e., whether certain changes 
are correlated with each other; (4) analysis 
of causes or determinants of intraindividual 
change, i.e., why individuals change from one 
period to another; and (5) analysis of causes 
or determinants of interindividual similarities 
or differences in intraindividual change, i.e., 
why different individuals change in different 
ways from one period to another. All of these 
objectives are concerned with patterns of devel- 
opmental change, specifically at the individ- 
ual level, although they are easily extended to 
aggregate levels (groups, organizations, cities, 
nations). At the individual level, intraindi- 
vidual changes may include things people 
think (becoming more politically conservative), 
things they do (becoming employed, changing 
jobs, retiring), or things that are done to them 
(being arrested or being robbed). In the study of 
intraindividual change, age serves as a proxy for 
age-related physiological changes and exposure 
to social influences (Hobcraft et al., 1982) which 
may be difficult or costly to measure directly. 
For some purposes it may be reasonable to 
draw simple inferences about intraindividual 
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change from cross-sectional data. For example, 
from cross-sectional data on rates of arrest and 
childbearing by age, we may reasonably infer 
that one’s likelihood of being arrested or of 
having a baby is practically nonexistent before 
age 7, increases in adolescence and early adult- 
hood, and diminishes substantially after age 
65. There would seem to be little chance that 
these age-related differences may be explained 
by period effects or cohort characteristics. On 
the other hand, it would not be safe to infer that 
people become more conservative and less edu- 
cated as they get older, based on cross-sectional 
data. Age differences in political attitudes at 
a particular period may reflect either changes 
in attitudes with age, or constancy in attitudes 
over the life cycle coupled with differences in 
attitudes between cohorts. If older people have 
less education than younger people, it is not 
because they become “de-educated”; a more 
plausible explanation is that educational attain- 
ment has increased over time (a period effect), 
resulting in differences in the average educa- 
tional level of successive cohorts. 

A more compelling need for longitudinal data 
arises if we wish to study “career” patterns 
of behavior. The most obvious application of 
this is in the study of labor market careers, 
from initial job entry through patterns of pro- 
motion, job change, job loss, and eventually 
either retirement or death. Closely related to 
this is the study of status attainment careers, 
which includes consideration of educational 
attainment as well as occupational status and 
income (e.g., Blau and Duncan, 1966). Other 
applications of the “career” perspective include 
marital histories (e.g., Becker et al., 1977), edu- 
cational attainment and the process of learning 
(e.g., Heyns, 1978), and criminal careers (e.g., 
Blumstein et al., 1986). Such studies have in 
common a concern with patterns of entry, con- 
tinuity, and exit from the behavior upon which 
the career is based, and with the correlates 
and potential causes associated with changes or 
discontinuities in the behavior (unemployment 


and obtaining a new job; divorce and remar- 
riage; dropout and re-entry in education; sus- 
pension and resumption of criminal behavior). 
It is only with longitudinal data, and more 
specifically panel data, that many of the ques- 
tions regarding developmental career patterns 
may be answered. 

Life course research (Giele and Elder, 1998) 
is similar to the study of individual careers, 
but broadens the career paradigm to explicitly 
locate intraindividual change within a broader 
historical and social context. Integral to the life 
course perspective are issues of (1) location in 
time (history) and place (society and culture); 
(2) linked lives, the integration of individuals’ 
lives with one another at the interpersonal and 
social institutional levels; (3) human agency, 
the ability and tendency of individuals to set 
goals and decide how to pursue them; and 
(4) timing of lives, individuals’ making deci- 
sions about whether and when to act in cer- 
tain ways, or formulating strategies for living 
based not only on internalized goals, but also 
in response to external events or conditions. 
In contrast to perspectives that see life transi- 
tions as progressing through a fixed sequence 
of stages, the life course perspective recognizes 
the interindividual variation in the sequenc- 
ing of life transitions as responses to differ- 
ences in individual goals (human agency) and 
external influences (timing of lives). Life course 
research focuses on phenomena which can be 
adequately analyzed only with long-term longi- 
tudinal research: event histories or trajectories 
which differ across individuals in timing, dura- 
tion, or rates of change. 

Parallel to the earlier concern with examin- 
ing changes in the strength or pattern of rela- 
tionships from one period to another, we may 
want to examine changes in the strength or pat- 
tern of relationships from one age to another. 
Here again the issue of whether to base the 
comparison on cross-sectional (intercohort) or 
longitudinal (intracohort) data may arise, and 
as before the decision hinges on whether we 
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are concerned with how well the developmen- 
tal changes are reflected in the cross-sectional 
data. If longitudinal data are used, the issue of 
whether any change that occurs is attributable 
to age, period, or cohort effects must again be 
considered. 

Using data from the National Youth Sur- 
vey, a prospective longitudinal panel survey 
of respondents aged 11-17 in 1976 and 21-27 
in 1986, Menard et al. (1989) found that mar- 
riage during adolescence was positively asso- 
ciated with substance use and mental health 
problems, but that marriage during young adult- 
hood (ages 21-27) was negatively associated 
with substance use and mental health prob- 
lems. Being enrolled in school had a negative 
association with illegal behavior, substance use, 
and mental health problems in adolescence, 
but no association with illegal behavior, sub- 
stance use, or mental health problems for young 
adults. Wofford (1989), analyzing the same sam- 
ple, found that employment was associated 
with higher rates of serious illegal behavior in 
adolescence and lower rates of serious illegal 
behavior in young adulthood (ages 18-24 in 
this study). Substantively, these results require 
explanation. From a life course perspective, 
there may be age-specific norms for certain 
behaviors (school, marriage, work), and violat- 
ing those norms may place one at greater risk 
of involvement in illegal or problem behav- 
ior. Methodologically, these results suggest that 
relationships among variables may change over 
the life course, and that it may be appropri- 
ate to test for the existence of such changes. 
In addition, the explanatory power of differ- 
ent theories may vary across the life course. 
In a series of tests of strain and control the- 
ories of illegal behavior, Menard (1992; 1997; 
Menard et al., 1993) found that the explanatory 
power of the theories was uniformly weakest in 
early adolescence (ages 11-14) and stronger in 
later adolescence (ages 14-17) and early adult- 
hood (ages 17—20) across a range of behaviors 
from minor delinquency and marijuana use to 


serious delinquency and hard drug use. With 
cross-sectional data, such differences may be 
attributed to age or to intercohort differences; 
with longitudinal data on multiple cohorts, 
it becomes possible to estimate the extent to 
which the differences are developmental, as 
opposed to period or intercohort differences. 


7 Conclusion 


Selection of the time dimension is a neces- 
sary first step in longitudinal research, and 
there may be more than one time dimension 
of interest, particularly in the study of indi- 
vidual as opposed to aggregate change over 
time (although it is also possible, as illus- 
trated above, that more than one time dimen- 
sion will be of interest for aggregate data as 
well). To the extent that longitudinal research 
involves multiple age groups over multiple 
time periods, separation of developmental and 
historical effects may be of interest, and for 
longer time periods and age spans, consider- 
ation of the impact of the interaction of age 
and period effects (being a certain age at a cer- 
tain time) may complicate the analysis of devel- 
opmental and historical trends. One approach 
is to treat cohorts as units of analysis, ana- 
lyzing them separately and noting the qualita- 
tive and quantitative differences among them in 
the explanation of differences in developmental 
and historical effects across cohorts. Another 
approach is to attempt to identify those char- 
acteristics of cohorts that may influence either 
historical and developmental outcomes, e.g., 
cohort size-effects on unemployment rates), or 
may influence the impacts of history and devel- 
opment on those outcomes (e.g., cohort size 
effects on the relationship between unemploy- 
ment rates and criminal behavior), and incor- 
porate them in the analysis. 

Based on the foregoing discussion, we may 
conclude the following: (1) Age is an appro- 
priate explanatory variable, but not a unit of 
analysis. In some instances it may be appro- 
priate to replace age with other variables for 
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which age is a proxy, but often using age itself 
is the most reliable and cost-efficient option. (2) 
Period is weakly appropriate as an explanatory 
variable, but not a unit of analysis. It will often 
be appropriate to replace period with either 
the specific historical event of interest (e.g., the 
Iraqi invasion of Kuwait), or better still with 
an indicator of exposure to that specific event 
(e.g., whether one was directly involved in the 
invasion or its aftermath, related to or knew 
someone directly involved, or merely aware of 
the event). (3) Cohort is appropriate as a unit of 
analysis, but is a poor or inappropriate explana- 
tory variable. Cohort as year of birth is best 
replaced by the relevant characteristic of the 
cohort (e.g., cohort size), or the relevant age- 
specific experiencing of or exposure to a spe- 
cific event (e.g., being 18 or 81 during the Iraqi 
invasion of Kuwait, or being age 18 or 81 for 
a more direct measure of exposure to the Iraqi 
invasion of Kuwait, as described above). Use 
of these guidelines allows us to better exam- 
ine developmental and historical influences 
on behavior, and to sort out developmental 
from historical from cohort-based influences on 
behavior. 


Author’s note 


Parts of this chapter were taken and/or adapted 
from my previous work, particularly Menard 
(2002); other parts are original to this chapter. 
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| Chapter 15 i 


An introduction to pooling 
cross-sectional and time series data 
John L. Worrall 


1 Introduction 


Pooling time series and cross-sectional data 
simply amounts to gathering repeated obser- 
vations on several units of analysis. Unfortu- 
nately, “pooling” is not the only term used to 
describe such research designs. Pooled datasets 
are most frequently called “panel datasets.” 
Other researchers prefer to describe their data as 
being of the “time series-cross section” (TSCS) 
variety. Still others use the term “multiple time 
series.” There is some disagreement in the liter- 
ature as to which term should apply, and when. 
For example, some have argued that panel data 
consist of relatively few observations on several 
units of analysis. In contrast, TSCS data con- 
tain many observations on relatively few units 
(e.g., Beck and Katz, 1995). Others, though, have 
argued that the same issues and methodological 
concerns present themselves regardless of how 
many units and/or time periods are included 
in a dataset (e.g., Kristensen and Wawro, 2003). 
The latter view is taken here. 


2 Three pooling problems 


At first glance, pooling time series and 
cross-section data would seem advantageous. 
A researcher who had data on, say, 50 units 
could do little in terms of quantitative analysis. 


By adding a hypothetical 10 time periods 
to each of those units, however, the sample 
size suddenly increases tenfold. Indeed, this 
increase in sample size is one of the key advan- 
tages of pooling. But pooling raises a number 
of issues that frequently-used statistical tech- 
niques, such as ordinary least squares (OLS) 
regression, are unequipped to address. The 
most significant of those issues is the addition 
of a time dimension. 

OLS applied to a cross-sectional dataset 
requires no concerns with autocorrelation, a 
problem that occurs when data are not indepen- 
dent along the time dimensions. By definition, 
cross-sectional data have no time dimension. 
However, when repeated observations are gath- 
ered for the same units, researchers cannot 
ignore serial dependence in the data. We know, 
for example, that an individual’s behavior is 
closely tied to his or her previous behavior. 
Moving to the macro level, we know that cer- 
tain government activities (such as budget allo- 
cations) are closely correlated from one time 
period to the next. 

Also important is a variation of a problem 
that routinely appears in the OLS context: 
Heteroskedasticity. A key OLS assumption is 
homoskedastic errors. When error variance is 
not constant across units, the result is het- 
eroskedasticity, and steps must be taken to 
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correct for it. When time series and cross- 
section data are pooled, a potential result is 
panel heteroskedasticity. This occurs when the 
error variance varies across units (over time) 
due to characteristics unique to each unit. In 
the “ordinary” heteroskedasticity context, some 
units are more variable than others. The effects, 
though, may be relatively modest, as such het- 
eroskedasticity affects only one unit at a time. 
But when a time dimension is added, the effects 
can magnify in a manner equivalent to the num- 
ber of time periods (Stimson, 1985, p. 919). 
The third problem that arises in pooling time 
series and cross sections is known as hetero- 
geneity. This can occur when all units are 
affected by a “shock” during the same time 
period. A re-election, a downturn in the econ- 
omy, or some other sudden event could cause 
such a problem. More formally, the errors across 
each unit will be correlated due to the event 
they all experienced. Another form of het- 
erogeneity consists of time-stable differences 
between units. Once these sorts of problems are 
addressed, the resulting estimates can be inter- 
preted just like ordinary least squares estimates. 


3 Fixed effects or random effects: 
three considerations 


Two basic methods are available for analyzing 
panel data. For the sake of full disclosure, there 
are scores of techniques available (and continu- 
ally being developed), but most are built around 
some variation of the two basic choices: Fixed 
effects models and random effects models. We 
begin with the basic cross-sectional OLS model, 
which looks something like this: 


y=a+Px+e (1) 
The notation for (1) is in matrix form. The “y” is 
the dependent variable, “a” is the intercept, “x” 
is a vector of independent variables, “8” repre- 
sents the regression coefficients, and “e” is the 


error term. The pooled model simply extends 
(1) to: 


Vit = A+ PX + €), (2) 


In (2) the “i” and “t” subscripts denote that 
we have pooled observations over units “i” and 
time periods “t.” Note that there is nothing dif- 
ferent between (1) and (2) other than the fact 
that (2) acknowledges there are repeated obser- 
vations on the same units. (2) is sometimes 
called the “constant coefficients model,” imply- 
ing that the regression coefficients are constant 
across units and time periods. 

The problem with (2) is that even though 
it acknowledges repeated observations on the 
same units, it ignores that fact. In other words, 
(2) makes no attempt to model the repeated 
observations. The unit and time dimensions 
therefore need to be taken into account. This 
is where the fixed and random effects mod- 
els become viable options for dealing with the 
panel data structure. The fixed effects models 
extend (2) to: 


Vit = 4; +8; + Bip t+ Ej (3) 


Note that (3) is the same as (2), but that (3) 
adds separate intercepts for each unit (denoted 
by “a;”). Also note the addition of “6,.” The 
“t” subscript denotes dummy variables for each 
time period. The logic for including separate 
intercepts for each unit and dummy variables 
for each time period will be discussed later 
because our concern here is with the differences 
between fixed and random effects estimation. 
The random effects model looks like this: 


Vig = A+ PX,,+U;+w,+é;, (4) 


Note two important differences between (3) and 
(4). (4) adds u, and w,, both of which spec- 
ify separate errors terms for both unit and time 
period. Also note the removal of the subscript 
“i” from a. This model assumes that unob- 
served differences between units and time are 
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random variables, compared with the assump- 
tion included in equation (3) that they are fixed. 
(4) is commonly called an error components 
model. 

What, then, would lead a researcher to choose 
between fixed or random effects estimation? 
There are at least three considerations: (1) the 
size of N and T; (2) correlation between the 
error term and observed variables; and (3) vari- 
ation over time in the predictors. Each of these 
considerations is touched on in the next three 
subsections. The fourth subsection briefly intro- 
duces a more objective approach to deciding 
between fixed and random effects estimation, 
the use of a Hausman test. 


3.1 Size of N and T 


With respect to fixed effects models, when 
the number of observations and/or time peri- 
ods becomes large and unwieldy, degrees of 
freedom are sacrificed and efficiency is lost. 
Fixed effects regressions of data with repeated 
observations on, say, 5000 cities would require 
the addition of 4999 additional parameters 
to the model—simply to model city-specific 
effects. Add to that a number of time periods 
and efficiency suffers even more. If N and T 
are large, that does not automatically mean a 
researcher should abandon fixed effects esti- 
mation. Rather, some additional considerations, 
each related to efficiency, ought to weigh in to 
the decision. 

One such consideration is complicated inter- 
pretation. The researcher may wish, for sub- 
stantive reasons, to examine the parameter 
estimates for each unit (or time period). If 
such information is of no substantive con- 
cern, then perhaps a random effects approach 
would be preferable. Likewise, multicollinear- 
ity should be considered. The estimation of 
multiple parameters is often confounded by 
correlation among predictor variables. Random 
effects estimation may be preferable if multi- 
collinearity is a problem. 


Fixed effects estimation also sacrifices some 
possibly useful information. In particular, it 
removes any of the average unit-to-unit varia- 
tion from the analysis. The introduction of fixed 
effects for each unit, for example, simply asks 
whether intraunit changes in some dependent 
variable are associated with intraunit changes 
in one or more independent variables. In other 
words, fixed effects estimation ignores the pos- 
sibility that unit-to-unit variation sheds light on 
the relationship between x and y. 


3.2 No correlation between the error term 
and predictors 


It would seem that efficiency losses associated 
with fixed effects estimation would drive one to 
opt for random effects estimation. But random 
effects estimation suffers from certain faults and 
limitations, as well. In particular, it assumes 
that the error term is not associated with any 
of the predictor variables. In other words, the 
assumption is that the predictor variables are 
not correlated with unobserved unit-specific 
effects. Most researchers are hard-pressed to 
make such a case. 

Random effects models further assume that 
the random error terms are unique to each unit 
of analysis (see (4) above) and do not change 
over time. Why would something predictive 
of the outcome materialize at one point in 
time and then remain constant (see Berk, 2004, 
pp. 178-180)? Random effects estimation can 
also be desirable, however, when one key lim- 
itation of the fixed effects approach manifests 
itself. To that we now turn. 


3.3 Problematic predictors 


With respect to fixed effects estimation, there 
are three types of “problematic” predictor vari- 
ables that limit its use. The first is a predictor 
variable that does not vary over time, such as 
a variable denoting whether a person, city, or 
county is liberal or conservative. Such a vari- 
able would be perfectly collinear with dum- 
mies for each unit. Likewise, predictor variables 


236 Handbook of LongitudihdP Rega: https:/afrilibrary.com 


that model events every unit experiences at 
the same time are perfectly correlated with the 
time dummies (see Allison, 1994). This does 
not mean, however, that events cannot be mod- 
eled in the fixed effects context. A later section 
of this chapter discusses modeling events with 
panel data. 

Predictors that change little over time are also 
problematic. Subtle changes from one year to 
the next can make a predictor variable look 
like a constant. As Beck and Katz (2004) have 
observed, “Fixed effects clearly eliminates any 
stable variables from the analysis, but also 
makes it difficult for variables that change 
only slowly to show their impact (when their 
impact is by and large inter- and not intrau- 
nit).” Researchers must then opt either for ran- 
dom effects estimation or, as we will see below, 
more sophisticated measures of interventions 
that are otherwise operationalized as dichoto- 
mous variables. 


3.4 The Hausman test 


Sometimes it is easier to choose between 
fixed and random effects estimation by using 
a Hausman test (see Hausman, 1978). This 
is a test of the null hypothesis that random 
effects would be consistent and efficient against 
the alternative hypothesis that random effects 
would be inconsistent. The question is whether 
there is significant correlation between the 
unobserved unit-specific random effects and 
the regressors. If there is no correlation, then the 
random effects model may be more powerful 
and parsimonious. 

The test statistic is calculated as [(Bpp- 
Baz)/(S*e5re- S"pre)] Where By, are the fixed 
effects model coefficients, B,, are the random 
effects model coefficients, sp. and s*sp_ are 
the variances of the fixed and random effects 
model coefficients, and the statistic has a chi- 
square distribution with as many degrees of 
freedom as there are predictors in the model. An 
insignificant p-value (greater than .05) means 
it is safe to use random effects. If the p-value 


is significant, however, fixed effects should be 
used. While it is tempting to take the test statis- 
tic at face value, as an “objective” criterion for 
choosing between one or the other approach, 
common sense should also be used. Substan- 
tive reasons may instead drive a researcher to 
choose random in lieu of fixed effects—or vice 
versa. Researchers should weigh all of the con- 
siderations raised throughout this section. 


4 Estimation issues in fixed 
effects models 


The focus of the rest of this chapter will be 
on fixed effects models, in contrast to ran- 
dom effects models. Fixed effects models are 
more commonly used in aggregate-level models 
where researchers routinely estimate the effects 
of events such as new policies and legal inter- 
ventions. Recall, though, that there are situ- 
ations where fixed effects regressions should 
not be used (such as with constant predictor 
variables). 

There are five key estimation issues associ- 
ated with fixed effects regression models. Save 
for the first, the others also apply in the random 
effects context. The issues are (1) heterogeneity; 
(2) dynamics; (3) panel heteroskedasticity and 
contemporaneous correlation; (4) stationarity; 
and (5) trends. Pooling of time series and cross- 
section data often causes various combinations 
of these issues/problems to arise. 


4.1 Heterogeneity 


Heterogeneity refers, generally, to differences in 
the units of analysis. For example, in a study of 
California counties it is clear that there are dif- 
ferences from one county to the next that need 
to be modeled. Differences of this sort are no 
less apparent in the cross-sectional context, but 
researchers cannot expressly model such het- 
erogeneity given the lack of a time dimension. 
When a time dimension is added, however, 
researchers can expressly model unit-specific 
heterogeneity by estimated separate intercepts 
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for each unit. This is accomplished through the 
introduction of dummy variables for each unit 
(minus one, which avoids the dummy variable 
trap). This can be seen via the “a,” in (3) above. 

More intuitively, unit-specific heterogeneity 
refers to time-stable characteristics of the units 
analyzed (Cornwell and Trumbull, 1994). One 
county, for example, may be more conservative 
than the next. Alternatively, one agency may be 
quite different from its counterpart. As Cherry 
(1999) illustrated, 


“..,suppose there are two criminal justice depart- 
ments. Department A follows strict accounting prac- 
tices that report high percentages of crimes, while 
department B is more lenient and reports lower per- 
centages. Noting that certainty of sanctions is typi- 
cally measured by the clearance rate, this disparity 
causes a problem when the reported data from the 
two jurisdictions are analyzed” (p. 754). 


Since heterogeneity is concerned with “dif- 
ferences,” it should also be pointed out that 
time periods differ from one another. The pos- 
sibility exists that events occur in certain time 
periods that affect all of the units simultane- 
ously. One example could be a downturn in the 
economy. Another example could be a terrorist 
incident or other highly publicized violent inci- 
dent. To the extent such possibilities exist and 
affect the units included in the panel dataset, 
researchers should model them. This is accom- 
plished by the addition of dummy variables for 
each time period (less one). This can be seen 
via the “6,” in (3) above. 

Panel data analysts often default to unit and 
time dummies, and sometimes include them in 
their models without any attention to whether 
they are truly needed. How is a researcher to 
decide whether unit and time heterogeneity 
should be controlled for? A simple F-test of 
either the unit and time dummies (or both) suf- 
fices. A significant value suggests they should 
be included in the model. In most instances the 
F-tests are significant, hence the common use of 
unit and time dummies in panel data analysis. 


It is also critical to point out that there are vari- 
ations on the dummy variable approach. One 
is the inclusion of linear trends, which will be 
covered shortly. 


4.2 Dynamics 


Panel data are rarely independent across the 
time dimension. Researchers expect, and rou- 
tinely see, serial correlation (or temporal auto- 
correlation) in the data. These “dynamics” need 
to be modeled. Indeed, it is not uncommon for 
the values of a particular unit from one time 
period to be associated with values for the same 
unit from another time period (Hanushek and 
Jackson, 1977; Maddala, 1992). A good example 
concerns the public budgeting process. Accord- 
ing to Worrall and Pratt (2003), 


“Tt is commonly understood that when public agen- 
cies do not spend their budgetary allotments for a 
particular year, their budgets can be reduced in sub- 
sequent years. Thus, the very nature of the public 
budgeting process ensures that an agency’s budget 
for one year is highly associated with its budget for 
the previous year” (p. 89). 


There are several tests available for detect- 
ing autocorrelation in panel data. Perhaps the 
most straightforward is the panel data analog 
of the standard Lagrange multiplier test. This is 
accomplished by estimating the OLS regression 
equation and then regressing the residuals on 
all of the independent variables and the lagged 
residual. If the coefficient on the lagged residual 
is significant, then the research can conclude 
autocorrelation exists. The test can be modified 
to handle more complex dynamic processes. 

Two common methods are used to correct 
for autocorrelation, if it is present. One is 
through the introduction of one or more lags 
of the dependent variable. Political scientists 
routinely use this approach, especially in light 
of Beck and Katz’s (1995) important study on 
the subject. A number of criminologists have 
done the same (e.g., Marvell and Moody, 1996; 
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Kovandzic, Sloan, and Vieraitis, 2002). The 
upside of this approach is twofold. One is that 
it expressly models autocorrelation via a coeffi- 
cient that can be interpreted. Another is that it 
has the benefit of controlling for omitted lagged 
effects (Marvell and Moody, 1996). 

A problem with including lagged dependent 
variables to control for autocorrelation is that 
they are probably correlated with the error term. 
This is particularly problematic when the time 
series is short (Hsiao, 1986). Also, lagging the 
dependent variable potentially results in lost 
observations (e.g., the lagged dependent vari- 
able at time 1 can no longer be treated as a 
dependent variable because there is no previous 
value of the variable to act as a (lagged) predic- 
tor). As an alternative, some researchers opt for 
regressions that include first-order autoregres- 
sive disturbance terms. These are arrived at by 
estimating p from # in a residual regression of 
Uj = BUjy_-1 +N, (Mundlak, 1978; Hsiao, 1986). 
Most statistics packages have a routine for such 
estimation (e.g., Stata’s —xtregar— command). 

It is worth digressing for a moment to further 
consider the issue of dynamics in the context 
of a short time series. When the time series is 
short, as we saw, controlling for autocorrelation 
with lagged dependent variables gobbles up 
degrees of freedom. Doing so also downwardly 
biases the coefficient on the lagged dependent 
variable, a problem known as Nickell bias, in 
reference to the individual who identified it 
(Nickell, 1981). Ideally, a panel data should 
have many time periods relative to units, but 
this is not always possible. Several criminolo- 
gists have seen fit to ignore the issue altogether 
(e.g., Greenberg and West, 2001; Zhao, Scheider 
and Thurman, 2002). 


4.3 Panel heteroskedasticity 
and contemporaneous correlation 


Two additional problems that arise in the 
panel data context have been labeled panel 
heteroskedasticity and contemporaneous corre- 
lation (Beck and Katz, 1995). The first refers 


to unit-to-unit variances in the errors. To illus- 
trate, the scale of the dependent variable, such 
as the crime rate, can differ markedly across 
units, something that could need account- 
ing for. Franzese (2002) proposed a simple 
method for detecting panel heteroskedasticity. 
It requires regressing the absolute values of the 
OLS residuals on the X variable that is thought 
to be closely associated with the errors. In the 
case of panel data, the unit-specific dummy 
variables (minus one) are those most likely to 
be associated with the error term. A significant 
F-test for those variables is indicative of het- 
eroskedasticity. 

The second problem, contemporaneous cor- 
relation, refers to correlated errors between two 
or more (though not necessarily all) units at 
the same time. This can occur when some 
development in one unit is linked in some 
fashion to another unit. A panel data analysis 
of several counties from several states might 
face this problem if, say, an event in one 
state affected all or several of the counties in 
that state at the same time. One could simply 
model such effects with state (in addition to 
county) dummies, but the resulting models can 
quickly become unwieldy and result in addi- 
tional losses to degrees of freedom. The test for 
detecting contemporaneous correlation is some- 
what more complicated than the one for detect- 
ing panel heteroskedasticity, but it is described 
in detail in Breusch and Pagan (1980). Stata’s 
user-written —xttest2— command is also helpful. 

How is one to deal with panel heteroskedas- 
ticity and/or contemporaneous correlation? 
Some researchers have ignored it altogether, 
but one of the more significant advances 
from research in political science is Beck 
and Katz’s (1995) “panel corrected standard 
errors” (PCSEs) approach. They proposed a 
relatively simple method for estimated panel 
data models with errors corrected for the panel 
data problems just highlighted. Their Monte 
Carlo simulations showed that PCSEs are accu- 
rate in the presence of both contemporaneous 
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correlation and panel heteroskedasticity. Sev- 
eral statistics packages now contain routines 
for estimating models with panel corrected 
standard errors. 


4.4 Nonstationarity 


Another panel data issue concerns stationarity. 
Panel data must be stationary. In formal terms, 
this means their means, variances, and auto- 
covariances (at various lags) remain constant 
across all time points. An augmented Dickey- 
Fuller test can be performed to detect station- 
arity/nonstationarity. This is accomplished by 
regressing the first-differenced dependent vari- 
able on the one-period lag of the dependent 
variable and lags of the first-differenced depen- 
dent variables (Enders, 1995). 


4.5 Trends 


As pointed out already, the typical fixed effect 
regression adds unit and year dummies (this 
is known as a “full fixed effects” or “two- 
way fixed effects” model). An extension of this 
approach used by some researchers is to include 
separate deterministic trends for each unit of 
analysis. These trends (coded from one to T 
for each unit) control for trends in each unit 
that depart from the annual shocks captured by 
the year dummies (see, e.g., Black and Nagin, 
1998; Marvell and Moody, 1996; Worrall, 2005). 
The trends amount to proxies for factors that 
make values of the dependent variable in one 
unit grow more or less than the other units as a 
whole. That is, they model departures from the 
norm. The problem, though, is that they can be 
highly collinear with other variables that also 
trend upward. 

As a twist on the trends-for-each-unit 
approach, some researchers have used a single 
trend variable (not tied to any specific unit) to 
model the incapacitative effect of crime con- 
trol legislation. A common problem for pol- 
icy researchers has been to separate deterrent 
effects (the individual chooses not to commit 


crime, for example, because of fear of punish- 
ment) from incapacitative effects (the individ- 
ual is rendered—usually physically—incapable 
of committing crime, e.g., by imprisonment) 
of such policies on crime. An early approach, 
taken by Kessler and Levitt (1999), involved 
examining the crime rate immediately follow- 
ing the passage of a new policy. Since inca- 
pacitation is unlikely to influence crime in the 
short run (since actual capture and imprison- 
ment would presumably not be immediate, but 
would take some time to implement) any result- 
ing decline in crime would have to result from 
deterrence (fear of capture and imprisonment 
as a result of the new policy). 

A more recent approach, taken by Marvell 
and Moody (2001), was to include a linear trend 
starting at the time a new policy is passed (in 
addition to other variables). In their view, an 
incapacitation trend variable “...assumes that, 
in the absence of the law, very few defen- 
dants would have escaped prison sentences, so 
that the incapacitation effect grows over time” 
(p. 103). Researchers can get even more sophis- 
ticated with the use of trend variables. Various 
combinations of interactions and even nonlin- 
ear trends can be modeled in an effort to capture 
interesting and complicated relationships. 


5 A practical example: welfare 
spending and crime 


To illustrate several of the issues raised thus 
far, let us revisit the author’s recent study on 
the relationship between welfare spending and 
crime (Worrall, 2005). The data for that study, 
which are also used here, consisted of yearly 
observations from 1990 to 1998 for all 58 coun- 
ties in California. We will more or less repli- 
cate the results from that study here, but with 
an eye toward understanding, first, what hap- 
pens when data are simply pooled ignoring the 
distinction between the time series and cross- 
sectional components. Then we will progres- 
sively account for the realities of panel data 
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by first introducing fixed effects, then adding 
trends, and finally adding panel corrected stan- 
dard errors. Dynamics will be dealt with via a 
lagged dependent variable, and it should also be 
pointed out that the data are stationary accord- 
ing to the augmented Dickey-Fuller test. The 
analyses also will be based on logged variables 
to minimize the effects of outliers (e.g., Marvell 
and Moody, 1996). 

The analyses reported below are more than 
illustrative. They offer a critical take on a 
wealth of previous research aimed at detecting 
the relationship between welfare spending and 
crime (see Worrall, 2005, for a review). Most 
such studies have employed cross-sectional 
designs and, in doing so, have found an inverse 
relationship between welfare spending and 
crime. Panel data, however, permit researchers 
to control for various unobserved time-stable 
(and time-period-specific) effects. But pooling 
alone does not ensure replication of results from 
OLS performed on cross-sectional data. As will 
become clear, when the data for the present 
analysis are pooled, a previously inverse rela- 
tionship became positive. That is, there was 
more crime in areas characterized by higher 
levels of welfare spending. Why? Panel het- 
eroskedasticity is a likely candidate. Omitted 
variable bias is another (for additional expla- 
nations see Kennedy, 2002). Panel data models 
help get around these very problems. As will 
become clear, fixed effects modeling causes the 
inverse relationship to reappear. 


5.1 Variables 


The dependent variable in the analyses reported 
below consists of the rates per 100,000 peo- 
ple of homicide, robbery, assault, burglary, 
and larceny. There were no homicides in 
80 county/years, so a value of .5 was added 
to the homicide variable. The measure of wel- 
fare spending is the cost-of-living adjusted Aid 
to Families with Dependent Children (AFDC) 
annual payment per recipient. A similar mea- 
sure has been used in a number of similar 


studies on this subject (e.g., DeFronzo, 1983; 
DeFronzo and Hannon, 1998). The indepen- 
dent variables are population mobility (Mobil- 
ity), the poverty level (Poverty), the percentage 
of single mother households (Mothers), popu- 
lation density (Density), percent Black (Black), 
percentage young males between the ages of 
13 and 17 (Male 1), and the percentage of young 
males between the ages of 18 and 25 (Male 2) 
(see Worrall, 2005, for additional details as well 
as summary statistics). 


5.2 Uncorrected OLS model 


The results reported in Table 15.1 are from 
a pooled OLS regression model. The results 
are equivalent to those of a basic OLS model 
with the only difference being repeated observa- 
tions on each of the units. The first coefficients 
reported in Table 15.1 are the lagged dependent 
variables (to control for autocorrelation). All are 
significant and positive, as expected. Welfare 
spending, however, is not inversely associated 
with crime in the pooled model. Addition- 
ally, the relationship is significant for homi- 
cide, assault, burglary, and larceny. This stands 
in contrast to the bulk of previous research on 
the subject (see, e.g., Worrall, 2005). Most past 
studies show less crime in areas characterized 
by high welfare spending, but pooling the data 
as we have here, without regard to the fact that 
there are repeated observations on each unit, is 
the likely explanation for this finding. In short, 
we cannot make much of the results reported 
in Table 15.1 because the model is incorrectly 
specified, but it does at least provide a base 
for comparison. The next step is to add fixed 
effects. 


5.3 Two-way fixed effects model 


Table 15.2 reports the results of models with 
county and year fixed effects (i.e., dummy vari- 
ables) added to the models. Two observations 
are apparent. First, all but one of the AFDC coef- 
ficients shifted their signs, consistent with most 
previous studies. What’s more, all but one lost 
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Table 15.1 Uncorrected OLS model 


Homicide Robbery Assault Burglary Larceny 
Lagged dep. var. 0.467 0.591 0.754 0.812 0.943 
(12.02)* (16.64)* (25.83) (28.79)* (57.40)* 
AFDC 0.671 0.281 0.208 0.246 0.224 
(2.96)** (0.97) (2.12)* (4.10)** (5.37)* 
Mobility —0.309 —1.062 —0.111 —0.123 —0.009 
(1.13) (2.85)** (0.91) (1.55) (0.17) 
Poverty 0.424 —0.089 0.085 0.052 0.001 
(3.34)** (0.55) (1.57) (1.48) (0.04) 
Mother —0.199 1.752 0.256 0.134 0.045 
(0.76) (4.65)** (2.18)* (1.91) (0.86) 
Density 0.007 0.166 —0.010 —0.007 —0.002 
(0.25) (4.40)** (0.84) (0.92) (0.45) 
Black 0.041 0.030 —0.028 —0.014 —0.008 
(1.09) (0.61) (1.72) (1.36) (1.12) 
Male 1 0.541 —0.302 —0.192 0.032 —0.001 
(1.72) (0.73) (1.39) (0.37) (0.02) 
Male 2 —0.139 —0.466 —0.010 —0.063 0.009 
(1.00) (2.50)* (0.16) (1.62) (0.33) 
Constant —9.128 0.124 —2.871 —2.213 —1.825 
(3.60) (0.04) (2.73) (3.39) (4.01) 
Observations 513 513 513 513 513 
R-squared 0.39 0.81 0.67 0.78 0.91 


Notes: Absolute value of t statistics in parentheses. 
* significant at 5%; 


“ sionificant at 1%. All F-statistics were significant at the p = .001 level. All variables expressed as per 
capita logarithms. County and year dummy output suppressed. 


their significance (compared to Table 15.1). It 
appears there may be a significant inverse rela- 
tionship between robbery and welfare spend- 
ing. Importantly, though, the models reported 
in Table 15.2 make no other corrections for the 
panel data structure besides (1) adding fixed 
effects and (2) controlling for autocorrelation. 
Other problems, such as panel heteroskedastic- 
ity, contemporaneous correlation, and trends, 
are ignored. The following sections consider the 
influence of such problems. 


5.4 PCSEs 


Table 15.3 extends the models reported in the 
previous section but adds panel corrected stan- 
dard errors, to correct for panel heteroskedas- 
ticity and contemporaneous correlation (which, 
incidentally, were present in the data, based 
on the tests reported earlier). Even after PCSEs 
were added to the models (via Stata’s —xtpcse— 
command), however the AFDC coefficient for 
robbery remained negative and significant. It 
would appear, then, that welfare spending may 
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Table 15.2 Fixed effects added 


Homicide Robbery Assault Burglary Larceny 
Lagged dep. var. —0.090 —0.129 0.416 0.363 0.462 
(1.93) (2.74)** (10.00)** (8.26)** (10.74)* 
AFDC —0.688 —3.588 —0.141 0.234 —0.193 
(0.83) (3.43) (0.35) (1.02) (1.17) 
Mobility —0.099 0.259 0.784 0.000 0.521 
(0.10) (0.22) (1.68) (0.00) (2.74)* 
Poverty =2,291 4.532 —0.010 0.192 0.225 
(3.04)** (4.76)** (0.03) (0.92) (1.49) 
Mother 0.476 —1.771 0.138 0.457 —0.111 
(0.71) (2.08)* (0.42) (2.44)* (0.83) 
Density —0.672 —1.855 —1.180 —0.291 0.105 
(0.64) (1.40) (2.30)* (0.99) (0.51) 
Black —0.325 —0.292 —0.302 —0.659 —0.117 
(0.88) (0.63) (1.69) (6.41)* (1.58) 
Male 1 0.129 0.346 —0.925 —0.329 0.119 
(0.26) (0.55) (3.67)** (2.33)* (1.20) 
Male 2 —0.087 0.657 0.189 0.324 0.029 
(0.37) (2.24)* (1.68) (4.91)* (0.62) 
Constant 5.184 20.817 0.619 —4.853 —4.226 
(0.42) (1.35) (0.10) (1.43) (1.79) 
Observations 513 513 513 513 513 
R-squared 0.64 0.90 0.77 0.86 0.94 


Notes: Absolute value of t statistics in parentheses. 
* significant at 5%; 


“ significant at 1%. All F-statistics were significant at the p = .001 level. All variables expressed as per 
capita logarithms. County and year dummy output suppressed. 


be associated with reductions in serious crime. 
This finding stands in contrast to the author’s 
2005 study (Worrall, 2005), for two reasons. 
First, that study controlled for autocorrelation 
by estimating AR(1) disturbance terms in lieu 
of lagging the dependent variable. Second, the 
2005 study included county-specific trends, the 
logic of which was discussed earlier. If we add 
such trends, as the models reported in the fol- 
lowing section do, the effect of welfare spend- 
ing on crime disappears altogether. 


5.5 Thinking about trends 


Table 15.4 extends the models from the previ- 
ous section but with the addition of separate 
county-specific trends (to model departures 
from statewide trends). As can be seen in 
Table 15.4, there appears to be no effect of 
welfare spending on serious crime. This is 
what the author reported in Worrall (2005). 
AFDC, of course, is not the only measure of 
welfare spending. When additional measures 
are included, as the author did in the 2005 
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Table 15.3 PCSEs added 


Homicide Robbery Assault Burglary Larceny 
Lagged dep. var. —0.090 —0.129 0.416 0.363 0.462 
(0.75) (0.71) (4.01)** (3.11)** (3.71)* 
AFDC —0.688 —3.588 —0.141 0.234 —0.193 
(0.75) (3.37)* (0.43) (1.10) (1.23) 
Mobility —0.099 0.259 0.784 0.000 0.521 
(0.12) (0.38) (2.18)* (0.00) (3.51)* 
Poverty =2.291 4.532 —0.010 0.192 0.225 
(3.58)** (3.13)** (0.03) (0.93) (1.98)* 
Mother 0.476 —1.771 0.138 0.457 —0.111 
(0.92) (3.01)** (0.31) (1.77) (0.90) 
Density —0.672 —1.855 —1.180 —0.291 0.105 
(0.68) (2.00)* (1.97)* (0.97) (0.44) 
Black —0.325 —0.292 —0.302 —0.659 —0.117 
(1.11) (0.81) (1.19) (3.39)** (0.90) 
Male 1 0.129 0.346 —0.925 —0.329 0.119 
(0.33) (0.49) (2.18)* (1.27) (0.58) 
Male 2 —0.087 0.657 0.189 0.324 0.029 
(0.39) (1.75) (1.46) (5.51)* (0.47) 
Constant 5.184 20.817 0.619 —4.853 —4.226 
(0.52) (1.57) (0.09) (1.52) (1.57) 
Observations 513 513 513 513 513 


Notes: Absolute value of t statistics in parentheses. 


* significant at 5%; 


* sionificant at 1%. All F-statistics were significant at the p = .001 level. All variables expressed as per 
capita logarithms. County and year dummy output suppressed. 


study, even more evidence suggests that wel- 
fare spending has little to no effect on serious 
crime. The author also checked the robustness 
of the findings through other means (such as 
by excluding various combinations of variables) 
and even by using a linear specification, each of 
which pointed to the independence of welfare 
spending and crime. This finding is important, 
once again, because it flies in the face of scores 
of previous studies. Panel data provide an inter- 
esting opportunity to model heterogeneity, and 
it appears that once such characteristics are 
modeled, and once an appropriate specification 


is used, a more complete story can be told. As 
the author concluded in the 2005 study, 


“The finding suggests that the relationship between 
welfare spending and macro-level crime rates is prac- 
tically nonexistent. This is likely due to statistical 
controls for unobserved heterogeneity and the intro- 
duction of dummy variables for each year (minus 
one)” (Worrall, 2005, p. 365). 


Researchers often specify their panel data 
models with little more than lip service to 
the issues presented when time series and 
cross-section data are pooled. The steps taken 
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Table 15.4 Trends added 


Homicide Robbery Assault Burglary Larceny 
Lagged dep. var. —0.226 —0.319 0.119 0.014 0.110 
(1.84) (1.93) (0.97) (0.12) (0.79) 
AFDC —0.859 —1.428 0.051 0.338 —0.205 
(0.88) (1.36) (0.09) (1.22) (0.93) 
Mobility 3.995 —5.967 10.060 6.397 —1.903 
(0.39) (0.41) (1.19) (1.83) (0.51) 
Poverty 10.282 —34.367 —4,532 —1.060 2.606 
(0.60) (1.00) (0.66) (0.20) (1.29) 
Mother 8.263 0.490 —6.250 —3.581 —1.440 
(1.75) (0.13) (1.81) (1.36) (0.95) 
Density —2.594 0.310 —1.332 —0.130 —0.465 
(1.88) (0.14) (1.66) (0.31) (1.36) 
Black 0.123 0.292 0.235 —0.355 —0.395 
(0.22) (0.83) (0.50) (1.12) (1.36) 
Male 1 0.815 1.085 —0.235 —0.118 0.140 
(0.89) (0.90) (0.43) (0.34) (0.60) 
Male 2 —2.230 —3.346 —0.039 0.228 0.677 
(1.66) (1.82) (0.06) (0.69) (2.44)* 
Constant —4.859 75.260 —49.732 —35.539 2.991 
(0.10) (0.86) (1.30) (1.65) (0.19) 

Observations 513 513 513 513 513 


Notes: Absolute value of t statistics in parentheses. 


* significant at 5%; 


™ significant at 1%. All F-statistics were significant at the p= .001 level. All variables expressed as per 
capita logarithms. County and year dummy output suppressed. 


here were partly for illustration, but they were 
also arguably necessary. First, the basic OLS 
model returned some strange results, several 
positive and significant relationships between 
welfare spending and crime. Fixed effects 
cleared that up. But the data also displayed 
panel heteroskedasticity and contemporaneous 
correlation, hence the addition of panel cor- 
rected standard errors. Finally, models were 
estimated with county-specific trends, which 
eliminated the significant relationship between 
welfare spending and robbery. If they were 
not necessary, then it still appears that wel- 
fare spending has almost no effect on serious 


crime (save for robbery). If people commit 
crime for sustenance and welfare payments dis- 
suade such behavior (DeFronzo, 1983), then 
one would expect reductions in burglary and 
larceny as well, but the coefficients in those 
models were never significant, regardless of the 
specification. 


6 Simple extensions 


The previous section provided a simple illus- 
tration of pooling time series and _ cross- 
section data and how coefficients can change 
depending on the extent to which specification 
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issues are addressed. Controlling for hetero- 
geneity and adding unit-specific-time trends 
appeared to have the most pronounced effect. 
But there are still other steps researchers can 
take to extend panel data models such that they 
tell a more complete story. This section briefly 
introduces such extensions, but not by way of 
example. Instead, what is presented here is a 
simple conceptual overview (with some how- 
to) of three extensions: (1) dealing with simul- 
taneity; (2) estimating the effects of events; and 
(3) exploring unit-specific effects. Each step has 
been taken in a number of criminological stud- 
ies. The intent here is to introduce them and 
describe how and why one might want to pur- 
sue them. 


6.1 Dealing with simultaneity 


Returning briefly to the welfare spending—crime 
discussion in the previous section, Piven and 
Cloward (1971) have argued that welfare pro- 
grams have at their core a sinister motive to 
control the poor. If crime is committed dispro- 
portionately by poor persons (as it appears to 
be), then logic suggests governments may alter 
welfare benefits in an effort to keep crime, and 
therefore the poor, in check. Put another way, 
governments may use crime as a consideration 
in setting welfare benefits. This is but one vari- 
ation of the so-called simultaneity problem, a 
two-way relationship between the independent 
and dependent variables. 

Welfare spending may reduce crime, but 
crime may affect welfare spending. If one is not 
convinced of the connection here, then consider 
police spending and crime. It is much more log- 
ical to conclude that police levels, either offi- 
cers per citizen or police spending per citizen, 
may reduce crime. It is also quite logical to 
conclude that crime may cause units of govern- 
ment to alter police levels. Getting to the bot- 
tom of such two-way relationships is of critical 
importance. 

OLS regression provides researchers 
with only one basic method of addressing 


simultaneity: instrumental variable regression 
as implemented in two-stage least squares. It 
is just as viable in the panel data context and 
has been used with great frequency, even in 
the police levels crime literature (for recent 
examples, see General Accounting Office, 2005; 
Evans and Owens, 2005). Another approach, 
only available in the panel data or time series 
context, is the Granger causality test (Granger, 
1969; Pindyck and Rubinfeld, 1991, pp. 
216-219). The Granger causality test has been 
used by a number of criminologists, including 
Marvell and Moody (1996) and Kovandzic 
et al. (2002). The test involves two separate 
autoregressive analyses. Let us consider the 
example of police levels and crime. 

The first step is to regress crime on lags 
of itself and lags of police levels (along with 
pertinent controls). The lagged variables are 
dropped if they are not significant. If lags of 
police levels are significant (as determined by 
an F-test), police levels “Granger cause” crime. 
The second test is the opposite, with police 
levels serving as the dependent variable. The 
model includes lags of police levels and crime. 
If lagged crime is significant in such a model, 
crime “Granger causes” police. There is one 
key limitation of the Granger causality test, 
notably its inability to detect instantaneous 
effects. That is, because the test relies on lags, 
in our example it would be impossible to deter- 
mine whether police levels affect crime (or vice- 
versa) in the same time period. However, the 
lack of a lagged impact implies the lack of a 
current impact because lagged crime is likely to 
be correlated with current crime through serial 
correlation. 


6.2 Estimating the effects of events 


Panel data models have long been consid- 
ered some of the best designs for the study 
of causation next to a purely random exper- 
iment. For example, Campbell and Stanley 
(1967, pp. 55-57) refer to panel models as 
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“excellent quasi-experimental design[s], per- 
haps the best of the more feasible designs.” 
Lempert (1966, pp. 130-131) stated that panel 
designs are research designs “par excellence.” 
Still other researchers have argued that panel 
techniques are well-suited to causal analysis 
(e.g., Stimson, 1985; Hsiao, 1986). One of the 
reasons panel data designs are regarded in this 
manner is because they can be used to estimate 
the effects of events, not unlike a classic exper- 
iment. Treatment (units which experience the 
event) and control (units which do not expe- 
rience the event) can be included in the same 
analysis. 

Estimating the effects of events with panel 
data is quite straightforward. The researcher 
has at least three options at his/her disposal. 
The first is to measure the presence or absence 
of an event with a 1/0 dummy variable. The 
event could be something that unfolds during 
the time span covered by the data, or it could 
be the presence or absence of some condition 
(such as a law). The only limitation to this 
approach is that it cannot be employed when 
(1) other time invariant variables are included 
in the model and (2) every unit experiences the 
event at the same time (see Allison, 1994). The 
second approach is to measure the event in lev- 
els. For example, Zhao et al. (2002) used panel 
data to assess the effects of COPS spending on 
crime. Rather than a 1/0 dichotomy for the pres- 
ence or absence of a COPS grant, their main 
variable was an estimate of the amount of grant 
funds. The third approach is to adopt more 
sophisticated coding schemes, particularly in 
cases where researchers are interested in more 
than the presence or absence of some event or 
condition. Various linear and nonlinear cod- 
ing schemes are possible (see Allison, 1994, 
pp. 185-187). 


6.3 Exploring unit-specific effects 


Panel data analysts are sometimes guilty 
of something known as the “geographic 


aggregation assumption” (Black and Nagin, 
1998, p. 213). This amounts to assuming that 
the parameters are constant across units. For 
example, a researcher may estimate a panel 
data model to measure the effect of an event, 
such as a new law, on crime. The model will 
return one coefficient for all the units/time 
periods. The assumption that an intervention 
has the same effect on crime across the board 
(as indicated by a single coefficient) can be 
quite perilous. Pesaran and Smith (1995) found, 
for example, that pooling panels (e.g., estimat- 
ing a single coefficient that applies across all 
units and time periods) can result in signifi- 
cant bias: “[g]iven the prevalence of aggregation 
and pooling in applied work, these results are 
of some importance, and indicate that the com- 
mon assumption of homogeneity in dynamic 
models is far from innocuous” (p. 102). Black 
and Nagin (1998) used this notion to attack Lott 
and Mustard’s (1997) “more guns, less crime” 
thesis. 

How do researchers deal with this problem, 
if it exists? By interacting unit dummies with 
the predictor variable of interest. This yields 
separate estimates for each unit. 

Black and Nagin interacted a Lott and 
Mustard’s “right-to-carry” variable (1,0 variable 
denoting the presence or absence of a right- 
to-carry law) with state dummy variables to 
come up with different estimates for each state. 
They also experimented with a variation on 
this approach to test whether the laws had the 
same effect on crime across different time peri- 
ods. Lott and Mustard (1997) found a significant 
inverse relationship between right-to-carry laws 
and several indicators of serious crime. When 
state-specific effects were explored, however, 
Black and Nagin (1998, pp. 213-14) concluded: 
“...we strongly reject the Lott and Mustard 
model’s assumption of a uniform impact 
across states... The estimates are disparate. Mur- 
ders decline in Florida but increase in West 
Virginia. Assaults fall in Maine but increase in 
Pennsylvania...” 
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7 Summary and additional readings 


The purpose of this chapter has been to 
offer a simple introduction to the pooling 
of time series and cross-section data (both of 
which yield a “panel” dataset). It began by 
highlighting the various issues in panel data 
and then covered means of testing and con- 
trolling for them. This was followed by a 
practical example of the links between wel- 
fare spending and serious crime. The last 
few sections explored extensions, such as 
dealing with simultaneity, modeling events, 
and moving beyond the homogeneous param- 
eters assumption. Sources on the analysis 
of panel data accessible to individuals with 
minimal statistical background include Alli- 
son (1994), Cherry (1999), Lott and Mustard 
(1997), Stimson (1985), and Worrall and Pratt 
(2004). More detailed and technical treatments 
include Wooldridge (2001), Hsiao (1986; 2003), 
Arellano (2003), and Frees (2004). 
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| Chapter 16 J 


Dynamic models and cross-sectional 
data: the consequences of dynamic 
misspecification 
Ronald Schoenberg 


1 Introduction 


Most social variables are inherently imbed- 
ded in time and, therefore, are generated by 
a dynamic process of some kind. This paper 
explores the consequences of applying a static 
model to such variables. We find that unbiased 
estimates of the underlying dynamic parame- 
ters through the application of a static model to 
a cross-section of data are possible only if the 
underlying dynamic process is “nonergodic,” 
ie., that the process is a function of the initial 
conditions. A wide-sense stationary dynamic 
model that is not a function of the initial state 
of the system is proposed here, and we find 
that the application of static models to such 
processes is not successful, in particular we 
find that the static estimates are attenuated or 
inflated versions of their dynamic counterparts. 

In the analysis of cross-sectional data, which 
constitutes most of the data available for analy- 
sis in sociology, some form ofa linear structural 
model is very often proposed and appropriately 
so. Theoretical specification is often unable to 


Reprinted from Social Science Research 2:133-144, 
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rise above the simple assertion that, when X 
increases Y increases or decreases in propor- 
tion (actually such a claim may not be so simple 
when imbedded within an array of such claims 
with regard to a large set of variables). When 
this is so, the representation of such assertions 
in linear structural equation models is correct, 
provided as well, of course, that assumptions 
about the behavior of excluded variables and in 
some cases about the distribution of the vari- 
ables themselves in the population are plau- 
sible. And the estimated parameters of these 
models have immediate substantive interpreta- 
tion relative to the hypothesized structure of 
relations of the variables in the model. That 
is, given the adequacy of the model, we may 
expect that a given increase in one variable of 
the model will have the expected consequences 
for the other variables in the model which that 
variable is hypothesized to affect. A clearly 
defined relationship exists, then, between the 
model and the “reality” out there. 

Many other models proposed for cross- 
sectional data, however, do not represent 
directly the claims of the theory but, rather, are 
surrogate models taking the place of dynamic 
models. The source of the claims regarding the 
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underlying process that has generated the data 
may specify that the variables are dynamically 
related, but the form of the collected data— 
the product of, say, a one-shot cross-sectional 
study—may preclude the specification of a 
dynamic model. In these cases, investigators 
commonly propose a linear structural equation 
model of some kind and proceed to estimate 
and interpret parameters while neglecting the 
dynamic nature of the underlying process. 

Such misspecification may be more common 
than supposed, since nonexperimental socio- 
logical variables are inherently time-dependent. 
Time dependency doesn’t necessarily imply 
that a dynamic model must be specified, but 
it does mean that the behavior of the variables 
across time must be scrutinized and that the 
investigator must determine whether or not this 
behavior is such that it precludes unbiased effi- 
cient estimates of the structural parameters. In 
the following sections, we shall explore various 
time-dependent configurations and we shall 
determine precisely the consequences of each 
on the application of static models to underly- 
ing dynamic processes. 

Similar investigations have been conducted 
by other authors. The results here are gener- 
alizations of some discussion of this issue by 
Kuh (1959). Furthermore, some of the hazards 
of specifying a static model have been pointed 
out by Heise (1975) using a simplified systems 
analysis formulation which did not include 
residual error terms. Related issues surrounding 
the estimation and specification of models that 
incorporate both cross-sectional and time-series 
features appear in Balestra and Nerlove (1966), 
Young (1972), and Theil and Goldberger (1961). 


2 General dynamic linear structural 
equation model 


Let Z, be a vector of p variables at time t, Aa px 
p matrix of coefficients, and Y, be a vector of 
residual variables. Let 


Z, = AZ,_, + Y;- (1) 


Furthermore, let E(Y,) =0 and E(Y,Y,) = 8,3, 
1t=s 
0t#s 
are not autocorrelated and have the variance— 
covariance matrix =. Equation (1) is a wide- 
sense stationary linear first-order difference 
equation, provided that the coefficient matrix A 
is non-singular and that its characteristic roots 
are less than one in modulus. Then, 


where 5, = i.e., the elements of Y, 


2,2) 2X (2) 


r=0 


is a solution of equation (1) which is indepen- 
dent of the initial state of the system and is 
stable (Miller, 1968, Chapter 4). If the charac- 
teristic roots are greater than 1 in modulus, the 
system will be unstable and divergent, which is 
clearly a property of only shortlived processes 
and, for which, specific models must be pro- 
posed rather than the general type discussed 
here (certainly studying such a process cross- 
sectionally would be hopeless). If the roots are 
zero or equal to one, then the process is a func- 
tion of the initial conditions. The analysis of 
such models cross-sectionally is possible; how- 
ever, a substantive argument that would sup- 
port a dynamic model of a large-scale social 
process that is strongly dependent on the initial 
state of the system must be carefully made. 

For our purposes here, we wish to derive an 
expression for the variance—covariance matrix 
of Z at time t¢ in terms of the coefficient matrix 
A and the variance—covariance matrix = of the 
residual variables. From equation (2), 


lo} 


E(Z,Zi,,) = >. ABA (3) 
r=0 

E(Z144Z,) = VAC EA (4) 
r=0 


From equation (1), 3 = (Z,—AZ,_,) (Z,-AZ,_,)' 
ZL! -AL,,21=Z,Z. A+ AZ, AY bat 
then, from equations (3) and (4), we get 
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E(Z,Z,_,) = AE(Z,Z;) = AY and E(Z,_,Z/) = 
E(Z,Z;)A’ = XA’ and, therefore, 


E(B) = S— ASA’ — ASA’ + ASA’! =S— ASA’ (5) 
3 Quasi-dynamic model 


A very commonly specified linear static or 
cross-sectional model is the multiple regression 
model: y = [X +z, where y is a dependent vari- 
able, X is a vector of independent variables, [ 
is a vector of regression coefficients, and z is a 
residual variable.* The well-known estimate of 
Tis P= yX'(XX’)1 = Xyx>xx- Suppose that the 
variances and covariances in > were generated 
by an analogous dynamic structural model, 


Y¥, = BX,_, + uy; (6) 
X,=X,_,+V, (7) 


or in the form of equation (1), 
vi - OB] [Via + Uu; 
X, (0 i ay V, 


‘To simplify the present discussion, issues with 
regard to means, intercepts, and sampling variability 
will be ignored. The dynamic models discussed here, 
the linear difference equations, are stationary and, 
therefore, no loss of generality is created by assuming 
that that stationary point is zero. That the measure- 
ment of variables in nondynamic structural models 
from their respective means does not endanger gen- 
erality is well known. Issues of sampling variabil- 
ity are not relevant to the discussion in this paper 
and are ignored by the presumption that we are 
studying a population. Those interested may refer 
to Goldberger (1964, p. 142) who thoroughly cov- 
ers sampling problems in univariate and bivariate 
dynamic models. Furthermore, all variable matrices 
in this paper have observation indices suppressed. 
Thus, all variable matrices (as opposed to coefficient 
matrices and variance—covariance matrices) which 
are here considered to be vectors, are implicitly 
rectangular matrices of N observations from some 
population on the appropriate number of variables, 
usually indefinite. For example, Z, in equation (1) is 
implicitly a p x N matrix of observations. 


Equation (7) implies that X is fixed in an 
expected sense, i.e., E(X,) and E(X,X/) are the 
same for all t. We can see immediately that the 
coefficient matrix of this model is singular. The 
variance—covariance matrix in this case will be 
a function of the initial state of the system. We 
will, despite this obstacle, be able to calculate 
the estimate of I in terms of the parameters of 
the dynamic model if we assume that the initial 
values are random and uncorrelated with resid- 
ual error terms. The solution of equation (7) is 


t 
X,=X)+ dV, 


r=0 


and substituting into equation (6), we get as its 
solution, 


t-1 
¥; = BX, +B) V,+u, 


r=0 


The population covariance matrix of y, and 
X, 18 Yyy = E(y,X;) = BX)Xj, since E(V,) = 
E(u,) = E(X)V)) = E(Xou)) = E(u, V;) = 0. Fur- 
thermore, >,, = E(X,X/) = X,Xj, and, therefore, 
P = BX, X/(X,X/)-1 =B. 

Thus, we see that the parameters of the mul- 
tiple regression model are unbiased estimates 
of the parameters of the analogous dynamic 
model. 


4 Autocorrelated model 


In the previous section, we have seen that the 
static multiple regression model has no strictly 
analogous dynamic counterpart, at least in the 
sense of dynamic as developed here, a stable 
ergodic stationary process that is not a function 
of the initial conditions. However, by making 
some assumptions about the initial conditions, 
i.e., that they are uncorrelated with future dis- 
turbances, we found that unbiased estimates of 
the underlying nonergodic dynamic parameters 
resulted from the application of the static model 
to the process. In this section, we shall see that 
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a static model with autocorrelated endogenous 
residuals also does not have an ergodic dynamic 
counterpart, i.e., the system is a function of the 
initial conditions and that, given some assump- 
tions about the initial conditions, the properly 
specified static model yields unbiased, though 
inefficient, estimates. 


First, let 
y, = BX,_, + uy, (8) 
X,=X,,4+ V,; (9) 
where 
u, = Pu,_, + U; (10) 


where u; and V, have the desirable properties of 
being serially and mutually uncorrelated. Sub- 
stituting equation (10) into equation (8), we get 


V, = BX,_, + bY. — OBX,_, + u; (11) 


Lagging equation (8) one time period and mul- 
tiplying by $, dy;_, = $BX,_, + du,_, and sub- 
stituting into equation (11): 


Vi = OY:-1 + BX,_, — PBX,_, + u; (12) 


Since the exogenous residuals are not autocor- 
related, 


X%=X4¥EV, (13) 
r=0 
Substituting equation (13) into equation (12), 
V, = (BX,4+ oy;_1) - (0x, +2 3 V,—-V,+ ui) 
The cross-sectional regression estimates are t= 


Syed = ¥, XXX). But 


XX —_ 


E(y,X;) = (Bx, + hy, — PBXy 


t-1 
+23-V,-V+ut) 


r=0 


r=0 


i 
(x + x v) = BX)Xo + bY;_1X4 — PBX)X 5 


assuming that E(V,V;,f 4s) =0, and E(u; V;) = 
E(u; Xo) = E(y,V;) =0 
Since 


t t 
E(X,X)) = (x +>) “] (x +¥° v) = X,X 


r=0 r=0 
then 


r= (BX) Xo + bY:-1Xq — PBX X5)(XoXo) 
=B+ PY; -1Xq (XoXo) * — B 
=B+¢l—¢B 


since E(y,_,X/) = E(y,Xj). Solving for f', [ = B. 
Thus, we see that the cross-sectional estimates 
of the regression coefficients are unbiased esti- 
mates of the analogous dynamic coefficients 
for the case in which the endogenous residu- 
als only are autocorrelated. Other authors that 
have made this point (Goldberger, 1964; Hibbs, 
1974) also point out that these estimates are not 
efficient, however, and that statistical inference 
when the endogenous residuals are autocorre- 
lated is extremely hazardous. 


5 Dynamic autocorrelated model 


In the first part of this paper, I developed a 
model for a dynamic system which, in addi- 
tion to other features, was not a function of the 
initial state of the system. In effect, this assump- 
tion represents the claim that the present state 
of the system is entirely a function of the struc- 
ture of the system: repeated reenactment of the 
dynamic sequence for a given structure (i.e., a 
given coefficient matrix and given disturbance 
variance—covariance matrix) would result in an 
identical expected state of the system for any 
time t, whatever the initial state or whatever 
the time since the initiation of the process. For 
many social processes, of course, this will not 
be a desirable assumption, in particular for very 
small-scale processes such as those generated 
in the laboratory. However, for many large-scale 
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social processes, I believe that explaining the 
present state of the system in terms of the ini- 
tial state is attributing important consequences 
to essentially fortuitous events. 

A term often used to distinguish these two 
kinds of dynamic processes is “ergodic” (Feller, 
1966, p. 208). The ergodic dynamic process 
“forgets” the past, and the nonergodic process 
is a function of its initial state. As we have 
seen in the previous section of this paper, the 
static regression model, with and without the 
autocorrelation of the endogenous disturbance 
term, is nonergodic. Thus, these models are 
inappropriate when we assume that the data 
have been generated by an ergodic dynamic 
process. 

When we relax our assumptions about the 
behavior of the exogenous variables, however, 
we will find that an ergodic process results. 
We have assumed previously that X,= X;_,+ V;, 
where V, is a matrix of serially uncorrelated 
random variables. Let us, instead, specify that 


X, = 0X,_,+V; (14) 


where V, retains the properties of serial uncor- 
relation and randomness, and @ is a diagonal 
coefficient matrix of autoregression coeffi- 
cients. Furthermore, let y, = BX,_, +u, where 
u, = pu,_, + uj. Then, by the results of the pre- 
vious section: 


VY, = BX,_, + OY. — OBX,_, + u; (15) 


putting equations (14) and (15) directly into 
the form of equation (1) results in a trivial 
singularity. Instead, lagging equation (14) one 
time period, X,_, = 0X;_.+ V;,, and solving for 
X;2, we get X;_. = 0-*(X,_,—Vi,); let wf = 
$BO'V;, + uy and insert these results into 
equation (15) to obtain: 


Vi, = OY4 + (B- fBO")X,_, +W; (16) 


Now we may put equations (14) and (16) into 
the form of equation (1): 


Mt fb (B— fBO") || via w; 
fe =[o ee Les Lie] 


The characteristic roots of the coefficient matrix 
of this equation are equal to ¢ and the diagonal 
elements of 9. Since negative values of ¢ and 
6 would be difficult to interpret, @ and the ele- 
ments of @ must be greater than zero, and, if the 
system is to be stable, they must also be less 
than one. 

Given that this is so, we may calculate the 
variance—covariance matrix of the variables at 
time tf using equation (5). Partitioning = in equa- 
tion (5) thus, 


oe 
bee i 
and 
agg — | Bvt BO Ve) 
LEV; we ECV; V;") 


but E(Viw") = ELV*(V%,0'°B’é + uf)] = 0, 
since, by hypothesis, the residuals in V;* are 
serially uncorrelated and are uncorrelated with 
the residuals in u;. Therefore, 


my _ | E(wiw;") 0 _{|wvo 


Substituting equations (18) and (19) and the 
coefficient matrix in equation (17) into equa- 
tion (5), and solving for =,,, we get 2, — 
$>,x0 — (B— BO"), = 0. Postmultiplying 
by 2, and using >>. =I, 


Sila ~ DE Dice el ax ~~ (B= BO *) 
“2202. SO 


but, since [ = Yyx2x, and letting p = 2,025) 
we get [ — 6f'p = (B— oB0-), and since @ is a 
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scalar, ['(I — dp) = B(I— ¢6"*)p; postmultiply- 
ing by (I- $p)1, fF = BU—60-)p(I-6-)pI— 
op)’. If we let the q autocorrelation coef- 
ficients in 0 be all equal to, say, [M, then 
p = YO3a8 = pXy Bal = al and f =[(u— )/ 
(1—p®)]B. 

The parameters of our static multiple regres- 
sion model, therefore, are proportional to the 
parameters of the analogous autocorrelated 
dynamic process, in which the constants of pro- 
portion are fairly complicated functions of the 
autocorrelation coefficients and the variances 
and covariances of the independent variables in 
the general case, and are somewhat simple func- 
tions of the autocorrelation coefficients in the 
case where those of the exogenous variables are 
all equal. Note also that, if the autocorrelation 
coefficients are all about equal in value—a per- 
fectly reasonable possibility—then the regres- 
sion coefficients calculated in the multiple 
regression model will be severely attenuated 
versions of the analogous dynamic process. 


6 A First-order dynamic model 


Stability of the variables in the model may 
also be produced by an explicit first-order 
autoregressive dynamic process. In the pre- 
vious section, an essentially zero-order pro- 
cess became a first-order process as a result 
of the autocorrelation. Here, we shall consider 
a model which is hypothesized directly to be 
a first-order process without autocorrelation. 
In the first-order dynamic process, we claim 
that the rate of the dependent variable is a 
function of the dependent variable at the pre- 
vious point in time as well as of the inde- 
pendent variables and the residual variables. 
For example, Ay, = ay,;_,+bx,_, + u, and Ax; = 
cx_,+¥,, but Ay, = y,—y;-, and Ax, = x,—X;_ 
and, therefore, y, = (1+ a)y,_, + bx,_, +u, and 
x, = (1+c)x,_,+¥, is a first-order autoregres- 
sive dynamic structural model incorporating 
one dependent and one independent variable. 
This is the model most commonly specified 


1+0 
Ye-t > Vy ~ Uy 


T+c 
Xt-4 > X ~ Vi 


Figure 16.1 Path diagram of a first-order auto 
regressive dynamic process 


in multiple-wave studies (e.g., Heise, 1970; 
Joreskog and Sérbom, 1975), where 1+ a and 
1+ are interpreted as “stability coefficients.” 
Figure 16.1 will make this more apparent for 
those more familiar with path diagrams. It dia- 
grammatically represents the aforementioned 
first-order autoregressive equations. 

These equations may be generalized to 
any number of endogenous and exogenous 
variables: 


AY, =AY,.4,4 BX, +0, (20) 
AX, = CX,_,+V, (21) 


where A contains coefficients of the relation- 
ships among the endogenous variables off the 
diagonal and the autoregressive coefficients on 
the diagonal; B contains the coefficients of the 
relationships of the endogenous to the exoge- 
nous variables; C is a diagonal matrix of autore- 
gressive coefficients of the exogenous variables; 
Y, is the matrix of endogenous variables; X, is 
the matrix of exogenous variables; and U, and 
V, are matrices of residuals with the usual prop- 
erties. 

In the analogous static or cross-sectional 
structural model let ITY = [TX + Z, where II 
is the matrix of regression coefficients of the 
endogenous variables in Y on the other endoge- 
nous variables (with the diagonal normed 
to 1); [ is the matrix of regression coeffi- 
cients of the endogenous variables in Y on 
the exogenous variables in X; and Z is the 
matrix of residual variables. Partitioning ©, the 
variance—covariance matrix of the variables, 
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into endogenous variables first and exogenous 
variables second, we have 


$e Fa a : a sey 


Ss pe E(XY’') E(XX’) 
rss ee Ss 
=| a ra 
(22) 


where ¢ = E(ZZ"). Since =: is ordinarily uncon- 
strained, then >,, = >,,- 
Rewriting equations (20) and (21): 


Y,=(+A)Y,_,+ BX,_,+U, and 
X; = (te C)X,_, + V, 


therefore, 
Y,|_ [I+A B Yog U, 
Le}- [ot fel [u] 29 
The characteristic roots of the coefficient matrix 


in equation (23) are equal to the characteristic 
roots of [+A and I+C, respectively, since 


I+A B 
| ; reg\= +All 
The roots of I+ C will be less than one in mod- 
ulus if the nonzero elements of C are less than 
zero. The nonzero elements of C, which is a 
diagonal matrix, will therefore be negative. The 
roots of [+A depend on the configuration of 
A. If A represents the coefficients of an exactly 
recursive set of equations, for instance, the ele- 
ments of A may be arranged so that the upper 
right portion of the matrix will be all zeros. 
The roots will then be the diagonal elements of 
I+ A, and, necessarily, the diagonal elements 
of A will be negative if the roots are to be less 
than one in modulus. 

Let 


W = E(U,U!),0 = E(V,V/), and B= Bi a 


Substituting this equation, the partitioned coef- 
ficient matrix in equation (23), and &, parti- 
tioned as in equation (22), into equation (5), 
and solving for &,,, we get 2,,—(I+A)2 + 
C) — B,,,(I+ C) =0. Postmultiplying by >_) and 
using >. =F 


42 ~~ (I+ A) Sy, 2xy Dax E+ (0) 


Letting p = >,,C2X then [+p = %,,(0+ 
C)>,7. From equation (22) we have %,,>z! 
= Wor where I and [ are the parameter matri- 
ces of the nondynamic structural model. Sub- 
stituting these results into equation (24): 


Tf —-(+A)f +p) -— BU +p) =0 (25) 


If the nondynamic structural model in equa- 
tion (22) is specified such that it is exactly 
identified and that ¢ is unconstrained, then equa- 
tion (25) contains enough equations to determine 
uniquely the parameters in II and f in terms of 
the parameters in A, B, and p. These expressions 
will necessarily be very complicated. They may 
be considerably simplified if we let the nonzero 
coefficients in C be equal to, say, w. Then, I + 
p=(1+ p)I, and equation (25) becomes [1/(1 + 
pw) ff — + ATP = Band MP = [w/(a+ 
p)I + A]-1B. Now, let P be a matrix of the off- 
diagonal elements of A with zeros on the diago- 
nal, and let the coefficients on the diagonal of A 
be equal to, say, a; then P+al = A and ir = 
[M/(1+pl+al+P]°B=[a+p/(1+p)I+ 
P)"1B. Lett =a+p/(1+), then 


ff = [1+ (4/9P}-1(1/B (26) 


Assuming again that the specified structural 
equation model in equation (22) is identified 
exactly and that ¢ is unconstrained, then we 
may deduce from equation (26) that 


A 


fl=I+(1/7P (27) 
f=(1/7B (28) 
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since the multiplication of the equations in 
equation (26) would result simply in m equa- 
tions of the following form, where m is the 
number of parameters in [I and [ that are 
being estimated: 7; = p,/7 and ¥,; = b;/7, where 
TP Vij, and b,; are the ij-th elements of 
Il, P, T, and B, respectively. If the model is over- 
identified or if the elements of ¢ must be con- 
strained to identify the model, the results are 
more complex. 

We see, then, that the estimated parameters of 
our static structural equation model are, in this 
case, simple proportions of the corresponding 
dynamic parameters, when in fact the underly- 
ing model that generated the data is dynamic. 
The constant of proportionality is a function of 
the autoregression coefficients of the dynamic 
model. Without making the argument too unre- 
alistic, we may assume that the autoregres- 
sion coefficients are all about equal to each 
other. Let that magnitude be represented by 
6. Then 


1/7 = (1+8)/[6(2+8)] (29) 


Because of the manner in which the dynamic 
model was specified in equations (20) and (21), 
6 must be less than O and greater than —1. 
A quick glance at equation (29) reveals that, 
as 6 approaches 0, i.e., as the autoregression 
coefficients approach 0 together, the constant 
of proportionality, 1/7, approaches 0, and the 
estimated cross-sectional coefficients become 
very attenuated versions of the corresponding 
dynamic coefficients. 


7 Conclusion 


Whether we like it or not, sociological variables 
are imbedded in time; and whether method- 
ologically convenient or not, we must ensure 
that our models properly reflect that fact. 
Experimental methodology, when appropriate, 
includes the investigator’s ability to manipu- 
late the independent variables and to control 
to some extent the nature of the endogenous 


residual variation, thus making reasonable the 
assumptions of fixed independent variables 
and serially uncorrelated residual variation, 
and, therefore, making reasonable the speci- 
fication of static fixed effects models. Such 
manipulation and control is not possible in 
nonexperimental methodology, however, ren- 
dering unreasonable the assumption of fixed 
exogenous variables and severely narrowing the 
instances in which the static model is appropri- 
ate. When considering a static model for time- 
dependent sociological variables, we must be 
prepared to justify claims regarding the behav- 
ior of these variables across time, and the suc- 
cess of cross-sectional analysis of sociological 
variables will depend on our success in defend- 
ing such claims. 

The analysis of static and dynamic mod- 
els presented in this paper point toward an 
inevitable conclusion: that efficient unbiased 
estimates of structural coefficients using static 
models are possible only when we are pre- 
pared to assume that the underlying process 
that has generated the data is nonergodic, i.e., 
that the process is a function of the initial state 
of the system; and that change in the exogenous 
variables at any point in time is the same. If, 
however, we wish to assume that the influence 
of the past vanishes in proportion to its dis- 
tance from the present, that the present state of 
the system is entirely a function of the struc- 
ture of the system and is not a function of the 
initial state of the system, then the system is 
ergodic, i.e., it has “forgotten” the past. Such 
a system, we have seen, cannot be successfully 
investigated by the use of static regression mod- 
els. Attempts to apply static models to ergodic 
dynamic processes have resulted in estimates of 
static parameters that are attenuated or inflated 
versions of the parameters of the underlying 
dynamic process. 

Because most social variables are inherently 
embedded in time and, therefore, are gener- 
ated by dynamic processes, we must pay strict 
attention to the nature of these processes. And, 
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if we are to continue to apply static structural 
models to these processes, we must be prepared 
to argue that they are nonergodic. 
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| Chapter 17 j 


Causal analysis with nonexperimental 
panel data 


David F. 


1 Introduction 


This chapter surveys methods for conducting 
causal analysis with non-experimental panel 
data (data containing observations for multi- 
ple cases at multiple, evenly-spaced times). It 
begins by describing the criteria used in empir- 
ical social science research for establishing that 
one variable is the cause of another. It then 
takes up, in turn, the analysis of models for 
qualitative outcomes, and models for quantita- 
tive outcomes. It concludes with a discussion 
of causal inference from independent cross- 
sections. To keep the discussion manageable, 
the chapter ignores problems arising from mea- 
surement error and missing data. 


2 Causal analysis with panel data 


For purposes of this chapter, the variable X is 
considered to be a cause of variable Y when 
three conditions are met: there exists some type 
of association between the two variables (e.g., 
a non-vanishing correlation or partial correla- 
tion), X precedes Y in time, and there is no 
other explanation for the association. This last 
condition means, in particular, that the associ- 
ation is not spurious. 

An assessment of the first criterion can be 
carried out with cross-sectional data (data for 


Greenberg 


a single time), but that is not true of the sec- 
ond condition. As Paul Lazarsfeld and Marjorie 
Fiske (1938) pointed out long ago, in a cross- 
sectional research design, all variables are mea- 
sured at the same time. Consequently, there is 
no way of knowing that a putative cause came 
before its supposed effect. With data measured 
at two or more points in time, the time-ordering 
of the observations is not in question. 

The gains to be had from panel data, Paul 
Lazarsfeld (1972) noted, are especially great 
when theoretical reasons can be found for 
thinking that the causal influences between 
two variables can occur in either direc- 
tion. In that circumstance, it may be unclear 
whether a correlation between X and Y 
exists because X causes Y,Y causes X, or 
other variables cause both X and Y. While 
reciprocal causal effects can be distinguished 
with cross-sectional data, the statistical pro- 
cedures for doing this (e.g., two-stage least- 
squares) require strong, untestable assumptions 
about the effects of exogenous variables on 
the endogenous variables. Sometimes these 
assumptions are implausible, making their use a 
dubious proposition. With panel data, it would 
seem, this difficulty should not arise. 

Over the past half-century, methodologists 
have developed procedures for carrying out 
causal analyses with panel data. This chapter 
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surveys the major methods now available for 
doing this, and points to some of the issues and 
pitfalls that arise in their use. 


3 Qualitative outcomes 


Lazarsfeld’s (1972) method for determining 
the predominant direction of causal influence 
between two variables may have been the first 
statistical method developed for the particular 
purpose of analyzing panel data. We illustrate 
the method with his example: voter opinion 
in the 1940 presidential campaign. Surveyers 
asked a panel of voters in June, and again in 
August, to specify the party for which they 
intended to vote in the November election, and 
to express an opinion of Republican candidate 
Wendell Wilkie. As expected, Republican vot- 
ers were more likely than Democrats to think 
well of Wilkie, but from this alone, one would 
have no way of knowing whether this was 
because Republicans were more likely to think 
favorably of their party’s nominee, whoever it 
was; or because voters who liked Wilkie were 
more likely to vote for his party. Lazarsfeld’s 


Table 17.1 Intention to vote and opinion of Wilkie 


method was designed to find out which influ- 
ence was stronger: the effect of party preference 
on respondents’ evaluations of the candidate, or 
the effect of their assessments of the candidate 
on party preference. 

Results of the survey are displayed in 
Table 17.1, a 16-fold table constructed by cross- 
tabulating preferences and opinions at the two 
times. Comparing the row totals, which mea- 
sure intentions and preferences at the first 
wave, with the column totals, which repre- 
sent intentions two months later, we see that 
some shift has taken place. In August there 
were fewer respondents expressing incongru- 
ent responses (those holding favorable opin- 
ions of Wilkie, but planning to vote Democratic, 
and those holding unfavorable opinions of 
Wilkie but planning to vote Republican) than 
in June. 

All the entries off the main diagonal (which 
runs from upper left to lower right) represent 
respondents whose views of Wilkie, or whose 
voting intentions (or both), changed between 
wave 1 and wave 2. The entries on the minor 
diagonal (which runs from lower left to upper 


First wave Second wave 

Democrat against Democrat Republican Republican _ Total 

Wilkie for Wilkie against Wilkie for Wilkie 
Democrat against 68 2° ae 1 72 
Wilkie (f11) (fi2) (fi3) (f4) (n,) 
Democrat for il aly, 0 1* 24 
Wilkie (Ga) (£2) (f3) (fo4) (ny) 
Republican against ali 0 23 11* 35* 
Wilkie (f31) (£52) (f33) (f3.4) (n,) 
Republican for 2 a a 129 135 
Wilkie (f,4) (fy) (f,.5) (fra) (n,) 
Total 82 15 27 142 266 


Note: the meaning of the asterisks and underlinings is explained in the text. 
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right) represent respondents who changed both 
their party preference and their opinion of 
Wilkie. These cases do not furnish information 
about the direction of causal influence because 
they represent either respondents who did not 
change on either variable, or respondents who 
changed on both. Consequently, they provide 
no information as to which changed first, party 
preference or opinion of the candidate. Lazars- 
feld ignores these cases, and concentrates on the 
eight cells with underlined frequencies. These 
cases all involve change on one variable but not 
the other. 

First, consider the underscored cases in the 
second and third rows of the table. These rep- 
resent cases of incongruous response at the 
first wave. They are Democrats who liked 
the Republican candidate (row 2) and Repub- 
licans who disliked the Republican candi- 
date (row 3). There are two ways incongruent 
responders could become congruent respon- 
ders. Democrats could maintain their party pref- 
erence, and come to dislike the Republican 
candidate (a response chosen by 11) , or they 
could maintain their opinion of the candidate 
but change their party affiliation (a response 
of 1 subject). Similarly, incongruent Republi- 
cans could become congruent responders by 
maintaining their party preference but com- 
ing to like Wilkie, or by maintaining their 
dislike for Wilkie but changing their party 
preference. All these movers from incongruous 
to congruous responses are marked by a single 
asterisk. 

The first and fourth row of the table shows 
responses that are initially congruent. They are 
Democrats who disliked Wilkie, and Republi- 
cans who liked him. A double asterisk marks 
those who moved from congruence to incongru- 
ence by abandoning their party or revising their 
views of the candidate. 

Lazarsfeld argues that if the influence of party 
preference on candidate assessment is stronger 
than the reverse influence, more of the respon- 
dents will maintain their party preference but 


alter their views of the candidate than vice 
versa. If the influence of candidate assessment 
is stronger, more of the respondents will main- 
tain their view of the candidate but alter their 
party preference. A significance test can be car- 
ried out to determine whether the difference 
in change patterns is greater than expected on 
the basis of chance, given the row and column 
totals. 

This method is limited to pairs of binary 
variables. Variables with more than two cat- 
egories could be accommodated only by 
dichotomizing, a procedure that loses infor- 
mation. In addition, the procedure becomes 
cumbersome when additional variables need 
to be taken into account. Consequently, ener- 
gies have been invested primarily in meth- 
ods for handling continuous variables. These 
methods will be reviewed in Section 4 of this 
chapter. 

Nevertheless, a few researchers have found 
distinct advantages in working with dichoto- 
mous variables. Ronald Kessler (1977), for 
example, observed that linear models for con- 
tinuous variables treat increases and decreases 
in a variable symmetrically. However, it is pos- 
sible that party preference has a different effect 
on prospective voters who dislike a candidate 
than it does on those who like that candi- 
date. Linear panel models (models for contin- 
uous variables), however, treat those effects as 
symmetrical. The analysis of cross-tabulations 
is well-suited to the study of asymmetrical 
influences. 

The additional information that can be 
extracted from the 16-fold table through strate- 
gic comparisons of cell frequencies can be seen 
by considering Kessler’s analysis of a two-wave 
survey of high school students surveyed in the 
Fall of 1971, and again in the Spring of 1972. 
Table 17.2 summarizes the responses to a “yes— 
no” question about marijuana use, and a scale 
of depression that has been dichotomized at the 
median. 
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Table 17.2 Marijuana use and depression 


First wave Second wave 

User depressed User undepressed Nonuser depressed Nonuserundepressed Total 
User 225 110 78 45 488 
depressed (f,4) (f,.) (f,5) (f,4) (n,) 
User 78 243 15 77 413 
undepressed (f2) (f,.) (f,3) (f4) (n,) 
Nonuser 159 58 1019 482 1718 
depressed (f54) (f.) (f55) (f54) (n;) 
Nonuser 44 135 401 1586 2166 
undepressed (£44) (G3) (f,5) (£44) (n,) 
Total 536 546 1513 2190 4785 


We consider the following four comparisons: 
depressed users becoming undepressed ver- 
sus depressed nonusers becoming undepressed 


(f,. +£,4)/0, = (110+ 45)/488 = .318 
versus 
(f32 + f44)/N3 = (58 + 482)/1718 = .314 
A = .318—.314 = .004 


undepressed users becoming depressed ver- 
sus undepressed nonusers becoming depressed 


(£,, + £)3)/M, = (78 +15)/413 = .225 
versus 
(£4, + fy3)/n4 = (44+ 401)/2166 = .205 
A = .225 —.205 = .020 


depressed users stopping use versus unde- 
pressed users stopping use 


(£,5 + £,43)/0, = (78 +45) /488 = .252 
versus 
(f55 + f.4)/M, = (15 +77)/413 = .223 
A = .252 —.223 = .029 


depressed nonusers initiating use versus 
undepressed nonusers initiating use 


(f,, + £55) /M3 = (159+ 58)/1718 = .126 
versus 
(fy, + £42)/n, = (444+135)/2166 = .083 
A = .126 —.083 = .043 


The differences are all quite small, indi- 
cating that the initial mental state and drug 
use status have at most weak effects on drug 
use and depression at a later time. Ignor- 
ing the small size of the effects, it appears 
from the first of the comparisons that among 
those who were depressed, marijuana users 
were a bit more likely than nonusers to 
lose their depression. The second compari- 
son shows that among those who were not 
depressed, marijuana users were more likely 
than nonusers to become depressed. The third 
comparison shows depressed users to be more 
likely than users who are not depressed to 
stop using. Lastly, depression raises the like- 
lihood that a nonuser will initiate marijuana 
use. 

To determine the role of third variables in 
maintaining an initial pattern, for each row in 
Table 17.2 we construct a 2x2 table, always 
placing the frequency of cases in which no 
change in either variable is experienced in the 
upper left cell, the frequency of cases in which 
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both variables change in the lower right cell, 
and the frequency of cases for which just one 
variable changes in the off-diagonal cells. The 
tendency of the time-1 cross-sectional distri- 
bution to be preserved at time-2, above and 
beyond the level required by the row and col- 
umn distributions, can be assessed by the Q 
statistic, defined by the equation 


gs (1) 


For the four rows in Table 17.2, the Q statis- 
tic is .083, —.245, —.129 and .126. The sig- 
nificance of each of the cross-lagged effects 
and these maintenance effects can be tested 
by fitting a Goodman log-linear model to the 
data. Todd Miller and Brian Flay (1996) have 
developed an alternative strategy for analyz- 
ing causal relationships from categorical data, 
specifically designed to examine stage-like phe- 
nomena (where one type of event — such as 
smoking tobacco cigarettes — is a stepping-stone 
to another type of event — such as smoking 
marijuana). 

The methods just summarized fail to take into 
account the length of time during which a vari- 
able changes. James Coleman’s (1964) method 
for analyzing panel data does just that. Assum- 
ing that change can take place at any time 
between the first and second wave, we write 
linear differential equations representing the 
flow of cases from one state to another. If, 
for example, there are four states defined by 
the combination of two binary variables (such 
as the use or nonuse of marijuana and the 
presence or absence of depression), the equa- 
tion representing flows of cases into state 1 
would be 


dn, 


“GE = TyQMy +1y3Mg3 +Ty4Mq— (To +031 +ly,)M, (2) 


Here rj, represents the instantaneous rate at 
which case are moving from state j into state i. 
The negative signs for the coefficient in paren- 
theses indicates that they represent transitions 


out of state 1 into the other three states. Simi- 
lar equations can be written for n,, n, and ny. 
With two waves of data, the 16 coefficients can 
be estimated from the 16 observed frequencies. 
The computations are too complicated to cover 
here, but Coleman (1964) shows how approxi- 
mate solutions can be obtained. 


4 Quantitative (interval-level) 
outcomes 


4.1 Cross-lagged panel correlations 


The earliest method for carrying out a causal 
analysis with interval-level panel data was 
inspired by Lazarsfeld’s procedures for analyz- 
ing dichotomous data. The method, known by 
its acronym CLPC (cross-lagged panel correla- 
tions), tries to answer the question, “Which 
is the more important influence, X on Y, or 
Y on X?”, when measurements on these vari- 
ables are available at just two points in time. 
As initially formulated, the method compares 
the correlation between X measured at time 1 
and Y measured at time 2 with the correla- 
tion between Y measured at time 1 and X 
measured at time Z. Using subscripts to iden- 
tify the wave, a comparison is made between 
ry,y, and ry,y,. The correlation that is largest 
in magnitude identifies the stronger influence 
(Campbell and Stanley, 1963). Originators of the 
method boasted that it could yield conclusions 
about causality on the basis of nonexperimental 
data that were virtually as trustworthy as those 
achieved with an experimental design involv- 
ing random assignment of subjects to different 
conditions (a procedure that eliminates spuri- 
ousness as a possible explanation for a relation- 
ship between two variables). 

Relieved of concerns about spuriousness that 
had up to then vexed researchers working with 
nonexperimental data, flocks of researchers 
adopted the new method. Soon, however, 
methodological critiques established that the 
CLPC method worked only under special 
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conditions that did not always hold. These 
assumptions were: 


1. that the relationships being studied be 
stationary (cross-sectional correlations not 
changing over time) and 

2. that the autocorrelations of each variable be 
equal. This means that ry, y, had to equal ry, y,. 


Some researchers modified the procedure so 
that the CLPC criterion involved a compari- 
son between the partial correlations ry y,y, and 
ry,y,x, Tather than the zero-order correlations 
(Pelz and Andrews, 1964). However, the devel- 
opment of more powerful statistical methods 
led most researchers to abandon the CLPC strat- 
egy as a method for analyzing causal influences. 

David Kenny (1975; 1979, pp. 235-39) res- 
urrected the comparison of cross-lagged panel 
correlations as a test for spuriousness. If the 
two cross-lagged correlations are not signifi- 
cantly different from one another, he noted, 
the relationship between X and Y is spurious. 
Under conditions of stationarity this is a valid 
test for spuriousness. As we will see, structural 
equation modeling leads to an additional test 
of spuriousness that is valid whether or not 
stationarity holds. Consequently, the CLPC is of 
limited value as a test for spuriousness. 

Faced with the limitations of the CLPC 
method, most researchers turned to other meth- 
ods for studying causal relationships with panel 
data. These methods include structural equa- 
tion modeling (SEM), the pooling of cross- 
sections, and latent growth curve modeling. 


4.2 Causal analysis in structural 
equation modeling 


SEM begins by postulating a linear causal 
model based on theoretical considerations. For 
a two-wave panel data set, one begins by writ- 
ing a linear structural equation for a continuous, 
interval-level dependent variable y at time 2. If 
x is an independent variable, the most general 
equation will utilize y at time 1, and x at times 1 


and 2, as predictors of y at time 2, with random 
error term e: 


Ving = A+ Dj, +b, Xj, + D3 Xin + ; (10) 


Where it is not needed for clarity, the first 
index, designating the case, will be suppressed. 
Additional predictors can be added easily; we 
will concentrate on the two-variable case to 
avoid complicating the equations unnecessar- 
ily. If there are more waves, one can write addi- 
tional equations for them, in each case using 
the lagged endogenous variable and the con- 
temporaneous and lagged exogenous variables 
as predictors. 

Ideally, the time between two waves should 
correspond roughly to the length of time for 
significant change in the dependent variable 
to occur. Someone who tried to determine the 
influence of smoking cigarettes on lung can- 
cer would find none if analyzing observations 
taken a day apart, because it typically takes 
years before smoking leads to cancer. 

The interpretation of the coefficients in equa- 
tion (10) can be clarified by subtracting the 
lagged value of y, from both sides of the equa- 
tion. Denoting the difference operator by A, 
we have 


AY; = Vi2 — Vin = At (By — 1) Vin + 2X1 + D3Xj2 + €} 
(11) 


The left-hand side in this equation represents 
change in y. Comparing the coefficients in this 
equation with those in equation (10), we see 
that all the coefficients are unchanged, except 
for the coefficient of the time 1 (lagged) endoge- 
nous variable. This coefficient is the original 
coefficient minus 1. It follows that we can go 
back and forth between an equation for the level 
of the dependent variable in the second wave, 
and an equation for the change in that variable 
between the first and second waves, with the 
simplest algebraic maneuver. 
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If we wish to interpret these equations as 
expressing a causal relationship, then a repre- 
sents a constant change for all cases between 
the first and second wave. The coefficient b,—1 
represents the effect of the value of y at time 1 
on change in y. One might wonder whether it 
is legitimate to think of y at time 1 as causing 
y at time 2. In many circumstances it is quite 
plausible to do so. Someone who has been vic- 
timized by a crime may respond by retaliating 
against the perpetrators, so that crime leads to 
crime. Actions at a given time may result in pos- 
itive or negative reinforcement that enhances 
or reduces the likelihood of that activity in 
the future. A student’s mastery of mathematics 
skills at a particular time may provide the foun- 
dation for further learning of more advanced 
mathematics topics, and so may prove to be 
a cause of mathematical competence at a later 
age. Children who have many friends may, 
by interacting with them, acquire social skills 
that enable them to widen their friendship net- 
works. In circumstances like this, it is meaning- 
ful to consider a variable to be influencing its 
own level at a later time. 

The coefficient b, is the effect of x at time 1 
on change in y. For example, the level of crime 
in a community could stimulate growth in the 
police force. The number of hours children 
spend doing homework or watching television 
could influence the rate at which their grades 
improve. The level of unemployment in a com- 
munity may influence the rate at which prices 
rise. 

If taken at face value, the coefficient b, 
expresses the effect of x at time 2 on the change 
y undergoes between times 1 and 2. Yet, the 
principle of causality insists that an event at 
a particular time can only influence events 
at future times. Notwithstanding this seeming 
difficulty, many researchers use this type of 
specification to represent the contemporaneous 
influence of one variable on change in another. 
Because time is often measured crudely, this 
term can be taken to represent the effect of 


influences that act quickly on a timescale set by 
the interval between waves. An alternative way 
of understanding this coefficient is to realize 
that the equality 


X, = X, + Ax (12) 


allows us to reparametrize equation (10) as 
expressing the lagged effect of x on y and the 
effect of change of x on change in y. 

If the coefficients in equation (10) remain 
constant for a long period, it is possible to 
project the long-term behavior of the system. 
It is easy to show that y will asymptotically 
approach an equilibrium value 


a+ (b, +b,)x 
SS 13 
YEQ 1-5, (13) 
provided that the coefficient b, is less than 1. 
We see that the effect of a change in x will 
be magnified by the factor 1/(1—,). This can 
be large. Consequently, the coefficients b, and 
b,, which represent the short-term effects of a 
change of x on y, give an incomplete view of 
the full, long-term consequences of change. 

In some research problems, there may be per- 
suasive theoretical reasons for thinking that one 
variable could influence another with a lag and 
also contemporaneously, and that there could, 
in addition, be a “change causes change” con- 
tribution. In this case, the researcher may be 
tempted to introduce all three terms (x,, x, and 
Ax) as predictors. Yet, because of the perfect 
linear dependence of these three variables, it is 
impossible to do this, at least if all influences 
are thought to be linear, and no additional infor- 
mation about their effects is available. Trying to 
do so would lead to perfect multicollinearity. 

At times, this circumstance leads to genuine 
ambiguity as to the nature of the causal process 
at work. At other times, however, some possi- 
bilities will seem more plausible than others. 
Consider the relationship between indices mea- 
suring time spent socializing and involvement 
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in theft and vandalism, in the first two waves 
of the Youth in Transition study of a nation- 
ally representative sample of high school stu- 
dents (Rosenberg and Rosenberg, 1978). The 
means, standard deviations, and correlations 
among these variables, measured a year apart, 
are shown in Panel A of Table 17.3. As 
expected, each time-2 variable is positively cor- 
related with its own time-1 value. The cross- 
sectional correlations between social life and 
theft-vandalism are positive, as are the cross- 
lagged coefficients. The mean social life score 
rose slightly over the course of a year, while 
involvement in theft and vandalism dropped 
slightly. 


Let us use these data to assess the influence 
of social life on theft. If we regress THEFT at 
time 2 on THEFT at time 1, and social life at 
times 1 and 2, we obtain the results shown in 
Panel B of Table 17.3. As expected, the effect of 
THEFT at time 1 on THEFT at time 2 is positive. 
The influence of initial levels of theft on change 
in theft is measured by .481 — 1 = —.519, which 
is substantially negative. We thus conclude that 
involvement in THEFT tended to decline more 
for those subjects with initially high levels of 
involvement than for the group as a whole, and 
it tended to decline less for those with initially 
low levels of involvement than for the group 
as a whole. The lagged and contemporaneous 
effects of social life are both positive but quite 


Table 17.3. The relationship between undesirable life events and self-esteem 


Panel A Correlations among the variables at two points in time* 


Variable Variable 

Yb THEFT1 SL2 THEFT2 Mean SD 
SL1 1 333.368 93.999 
THEFT1 .22 1 153.097 54.509 
SL2 wal 16 I 371.566 92.516 
THEFT2 18 A7 Pe tg 1 137.193 48.945 


8n= 1412. SL = social life, THEFT = theft and vandalism. Source: Bynner, O’Malley 


and Bachman (1981). 


Panel B Regression of THEFT, on THEFT,, SL, and SL, 


Variables b Stes beta t sig 
constant 49.969 5.803 8.611 -000 
THEFT 1 .418 022 448 18.693 -000 
SL1 .023 .015 .042 1.533 125 
SL2 .042 .015 .077 2.820 .005 


R? = .231 
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small. Only the contemporaneous effect is sta- 
tistically significant at the .05 level.’ 

Taken at face value, these results suggest 
that the effect of undesirable life events on 
theft tends to be short-lived. Yet one might 
ask how we don’t know that we should write 
SL2 as SL1 + ASL? If we did this, we could 
eliminate SL2 and obtain the prediction equa- 
tion 


THEFT 2 — 49.969 + .418THEFT 1 
+ .065SL1+.042ASL (14) 


In this equation, all coefficients are statisti- 
cally significant, and, of course, R? remains 
unchanged at .231. This equation contains noth- 
ing out of the ordinary. It says that the initial 
level of social life and change in social life both 
slightly increase levels of involvement in theft. 
Alternately, we could eliminate SL1 and obtain 
the prediction equation 


THEFT 2 — 49.969 +.418THEFT 1 
+ .065SL2—.023ASL (15) 


Now the coefficient for the effect of change in 
social life on change in theft is of opposite sign 
to the contemporaneous effect of social life on 
theft. It is difficult to think of a theoretical rea- 
son why social life should increase theft, but an 
increase in social life should reduce theft. As 
it happens, the coefficient —.023 is not statisti- 
cally significant (p = .125), and can be dropped 
from the equation. Had the sample size been 
larger, however, it would have been significant, 
and would then pose a problem in interpreta- 
tion. In that circumstance, equation (15) would 


1The data were collected through a complex sam- 
ple design, making conventional inferential statis- 
tics invalid. Nevertheless, we follow the practice of 
earlier analysts of these data (Bynner, O’Malley and 
Bachman, 1981) in using standard t-tests, unadjusted 
for sample design. A proper analysis would use sam- 
ple weights. 


offer a less plausible understanding of the influ- 
ence of social life on theft than equation (14). 

It would be plausible, however, to think that 
change in social life leads to change in theft, 
with no instantaneous or lagged effect of a level 
of social life on time-2 theft. Estimating this 
model, with a lagged endogenous variable, we 
obtain 


THEF T2 = 69.636 + .439THEFT 1+ .0O9ASL 
(16) 


In this equation, R? is .221, and the coefficient 
of .009 is not statistically significant (p = .470). 
The loss of explanatory power resulting from 
the omission of the contemporaneous effect 
in equation (15) is significant. We thus have 
clear evidence that the level of social life influ- 
ences change in the level of theft, while a pure 
“change causes change” model doesn’t perform. 

Thus far we have been assuming that the 
model being estimated is one that directly 
describes the causal influences at work between 
the two variables. There are, however, addi- 
tional circumstances in which it may be appro- 
priate to estimate equation (10). Let us consider 
a few of the most commonly encountered 
examples. 

Suppose that x influences y contemporane- 
ously: 


y¥;, = a+bx,+e, (17) 


with no lagged endogenous variable. Suppose 
also that the residuals, instead of being inde- 
pendent of one another, are characterized by 
first-order serial correlation, 


@, = p&;_. +0, (18) 


We assume that the u, are independent and 
uncorrelated with e,,and with x,. Although 
the OLS estimates of a and b in equation (17) 
are unbiased even in the presence of serially 
correlated errors, the estimates are inefficient. 
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The standard errors are not as small as they 
would be in the absence of serially correlated 
errors. To obtain greater precision in estima- 
tion, we eliminate the serial correlation. To do 
this, multiply equation (17) by rho, and subtract 
the result from equation (17). After rearranging 
terms, we have 


Y, = a(1—p) + py;_1 — bpx,_, + bx, + u; (19) 


Now we have an equation that includes a 
lagged dependent variable, and has no serially 
correlated errors. If the model is correct, the 
coefficients are constrained; in the notation of 
equation (10), 


“bib 4 (20) 


This constraint can be evaluated roughly by 
inserting estimates into equation (20) and see- 
ing whether the equality is approximately 
valid. Taking the coefficients from Panel B of 
Table 17.3, we find that the left-hand side is 
—(.023)/(.042)/(.418) = —1.31. This is far from 
the right-hand side, suggesting that first-order 
serially correlated errors are not responsible for 
the presence of the lagged endogenous variable. 

A more formal test of the constraint com- 
pares the goodness of fit when the constraint 
is imposed with the goodness of fit when all 
coefficients are estimated freely. If the restraints 
hold, the lagged endogenous variable may be 
due to serial correlation among the errors rather 
than being part of the causal model. 

Next, consider a distributed lag model, in 
which x influences y with a strength that dies 
out gradually, instead of dissipating fully in just 
one time period: 


¥, = a+ b,x, + b,x, . + 53x, 2 +-:- 
+e,+@,e,,+,e, 5... (21) 
Note that this equation, like equation (17), does 
not contain a lagged value of y. It predicts the 


level of y with no dynamic effects built in. How- 
ever, it contains an infinite series in the lagged 


values of x. In addition, it is influenced by 
lagged shocks at past times, as well as at time t. 
In empirical research, there can only be a 
finite number of waves, so equation (21) would 
have to be truncated in order to estimate it. In 
addition, the estimation of many coefficients 
eats up degrees of freedom. To avoid these 
problems, researchers often suppose that the 
influence of x on y diminishes geometrically 
with time and that the influence of the lagged 
shocks drops off in the same manner.” With this 
assumption, we can write equation (21) as 


Vp = a+ by (XK, +AX_y t+APX_2 +...) 
+e,+A@,_,+A7e_5... (22) 


If we lag this equation by one time unit, multi- 
ply it by lambda, subtract the result from equa- 
tion (22), and rearrange terms, we obtain 


Ye = (1 —A) AY, +b x, + & (23) 


This procedure eliminates the infinite series; 
now we have only an instantaneous effect of x 
on y, but a lagged endogenous variable. Because 
there are now no serially correlated errors, this 
equation can be estimated by OLS regression 
without bias. 

As these examples illustrate, a lagged endoge- 
nous variable can arise in a number of ways, 
and consequently the researcher should make 
an effort to differentiate these on the basis of 
both theoretical considerations and empirical 
evidence as to the constraints some of these pos- 
sibilities imply. To complicate matters further, 
Allison (1990) has pointed out that in some cir- 
cumstances the inclusion of a lagged dependent 
variable leads to counter-intuitive results, and 
that more meaningful results can be obtained 
by omitting it. These anomalies occur because 


Because omitted variables are the most common 
source of serially correlated errors, it is reasonable 
that this should be approximately true. 
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the presence of serially correlated errors in the 
true model leads to bias when not taken into 
account properly in the estimation. 


4.3 Jointly dependent outcomes 


Thus far we have been assuming that causal 
influences flow in just one direction — from x 
to y either contemporaneously, or with a lag, or 
both. Sometimes, however, there may be theo- 
retical reasons for suspecting that x influences y 
and that y also influences x. There may also be 
reasons for thinking that any observed relation- 
ship between x and yis not causal but spurious. 
In circumstances like this panel data can help 
to clarify the relationships among variables. 

For simplicity, consider the two-wave, two- 
variable model. The most general linear rela- 
tionship among the interval-level variables x 
and y can be expressed by a pair of equations, 
one for each endogenous variable. 


Xjg = Ay + Dy Xi + Da Vin + Di Vi2 + @; 
Vin = Oy + D941 X 4 + Dag Vig + Dog Xin + fi (24) 


In these equations e and f are random error 
terms. However, we do not assume that e is 
uncorrelated with f. This model is displayed 
visually in Figure 17.1, in which the Greek let- 
ter psi represents the covariance between the 
two error terms. 

When causal influences flow in just one 
direction, from x to y but not from y to x (as in 
equation (10)), the solutions either explode or 
approach an equilibrium level asymptotically. 


Figure 17.1 A two-wave, two-variable linear panel 
model 


Where x and y influence one another recipro- 
cally, the solutions can take on more complex 
forms. One can obtain solutions that explode, 
oscillate, or approach equilibrium monotoni- 
cally or in an oscillatory manner. The expres- 
sions for the equilibrium values of x and y 
are complicated; they are given in Kessler and 
Greenberg (1979, pp. 118-22). 

In many applications, the correlation 
between the two error terms, e and f, is most 
plausibly attributed to unobserved variables 
that influence both x and y. Consequently, the 
test for the significance of this correlation is 
also, in part, a partial test for spuriousness. 

Estimation of equation (24) poses two prob- 
lems. Because x, and y, mutually influence 
one another, ordinary least-squares estimates 
of the parameters in the equations suffer from 
simultaneity bias. To avoid this type of bias, 
the estimation should use instrumental variable 
methods (such as two-stage least squares) or 
maximum-likelihood estimation. 

Second, in the absence of some additional 
information, equation (28) is under-identified. 
In the two-wave model for two standard- 
ized variables there are eight independent 
parameters to be estimated. If the variables are 
standardized, there are the two autoregressive 
coefficients, the two cross-lagged coefficients, 
the two cross-instantaneous coefficients, the 
correlation between the two time-one variables, 
and the correlation between the two prediction 
errors. Yet there are only six independent cor- 
relations among the four variables of the model 
available for carrying out the estimation. This 
means that it is impossible to obtain unique 
estimates of the parameters. 

Most researchers achieve identification by 
assuming that the cross-instantaneous influ- 
ences vanish, but it is also possible to assume 
that the cross-lagged influences vanish. If one 
makes either assumption, and allows the error 
terms e and fto be correlated, the model is just- 
identified. It will fit the variance—covariance 
matrix perfectly. Once one has a model in 
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which the observed and theoretically predicted 
variances and covariances do not differ sig- 
nificantly, one can test individual coefficients 
for significance with f-tests or likelihood-ratio 
tests. To achieve a more parsimonious model, 
researchers often drop insignificant effects, 
always making sure that the overall goodness 
of fit does not deteriorate significantly. 

To illustrate these methods, we continue 
to consider the relationship between juve- 
nile delinquency and social life. We consider 
the possibility that each variable influences 
the other, either with a lag or contemporane- 
ously. First, consider a model in which there 
are only lagged cross-effects, but no cross- 
contemporaneous causal effects. We do not 
constrain the correlation of prediction errors. 
The chi-square statistic for this model is zero, 
indicating a perfect fit to the data. Maximum- 
likelihood estimates of the standardized coeffi- 
cients are shown in Panel A of Table 17.4, and 
below them, the t-statistic. The stability coeffi- 
cients are both less than 1, so that regression 
to the mean is present for both variables. The 
cross-lagged influences are both positive and 
statistically significant, but the standardized 


coefficient for the effect of social life on theft 
is considerably greater than the coefficient for 
the effect of theft on social life. Psi, the estimate 
of the correlation between the two prediction 
errors, is close to zero and not statistically sig- 
nificant, indicating that omitted variables that 
influence both theft and social life are not sig- 
nificantly affecting the estimates. 

We can modify the model so that the cross- 
influences between undesirable life events and 
theft are contemporaneous rather than lagged. 
With the correlation of prediction errors esti- 
mated from the data, this model also fits the 
data perfectly. The estimates for this model are 
in Panel B of Table 17.4. All stability coef- 
ficients are again less than 1, and the cross- 
influences are of comparable magnitude to 
those in Panel A. However, the correlation of 
prediction errors is significantly different from 
zero, suggesting that omitted variable bias or 
some other form of misspecification may be 
present. 

Another way of testing for spuriousness 
is to consider models of the sort illustrated 
in Figure 17.2. In this model, the relationship 
between social life and involvement in theft is 


Table 17.4 The influence on social life on juvenile theft in a two-wave panel model (n = 1412) 


Independent variables 


Panel A 


Dependent variables 


Panel B 


Dependent variables 


4 THEFT2 SL2 THEFT2 
SL1 .499 .238 463 
(21.27) (10.24) (15.815) 
THEFT1 .050 418 .367 
(2.14) (17.96) (14.710) 
SL2 .219 
(2.120) 
THEFT2 .262 
(9.138) 
vi. —.018 —.459 
(—.94) (—7.03) 
R? 263 275 248 .089 
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Figure 17.2 A two-wave, two-variable panel model 
of complete spuriousness 


entirely spurious, due to the influence of an 
omitted variable z on both observed variables. 
This variable is assumed to be first-order autore- 
gressive, i.e., Z, = pz,_, + e,. If one assumes that 
the coefficients aj and bj are constant over time, 
it follows from the rules of path analysis that the 
cross-lagged correlations of SL1 with THEFT2 
and THEFT1 with SL2 will be equal. This is 
the Kenny criterion for spuriousness. However, 
without making that assumption, we see that 
another equality also holds: 


Ts11,9L20 THEFT1,THEFT2 = Tsi1,THEFT2! THEFT1,8L2 (28) 


Inserting the observed values for these correla- 
tions, we see that neither inequality is close to 
being obeyed. The estimate of left-hand side in 
equation (25) is .33; the right-hand side is .16. 
The left-hand side of equation (26) is (.51)(.47) 
= .24; the right-hand side is (.33)(.16) = .05. 
The chi-square test for the model as a whole 
is 218.41, with one degree of freedom. This 
is highly significant, indicating that the fit is 
grossly unacceptable. Consequently, we con- 
clude that the relationship between social life 
and theft is not entirely spurious. However, we 
cannot exclude the possibility that it is partly 
spurious. Consequently, the estimates for the 
causal models must be interpreted with caution; 
they are predicated on the model itself being 
correctly specified. 


We can now return to the comparison of 
cross-lagged correlations to see the superior- 
ity of structural equation modeling. To test for 
the significance of the difference between the 
correlation of SL1 with THEFT2 and THEFT1 
with SL2, one would compute the Pearson- 
Filon statistic (Kenney, 1979, pp. 238-39), and 
discover that the statistic in this case is statisti- 
cally significant. One would then conclude that 
social life influences theft more than theft influ- 
ences social life. A structural equation modeler 
can carry out the same test by constraining 
the two cross-influences to be the same and 
comparing the chi-square statistics of the con- 
strained and unconstrained models. Because 
the two models are nested (one is a special case 
of the other), the difference in chi-square statis- 
tics is itself a chi-square statistic, one which, 
in the present case, is highly significant (chi- 
square = 14.82 with one degree of freedom, 
p = .00012), We thus reach the same conclusion 
as would be obtained from the CLPC approach, 
but we learn a great deal more. In particular, 
we learn that both cross-influences are statis- 
tically significant, something that would not 
be learned in a CLPC analysis. In addition, we 
obtain numerical estimates of the parameters in 
the model. 


4.4 Pooling methods 


Pooling methods offer a different way of deal- 
ing with the question of spuriousness (Johnston 
and DiNardo, 1997, pp. 388-409; Greene, 2000; 
Wooldridge, 2001; Baltagi, 2005; see also Wor- 
rall, Chapter 15 and Finkel, Chapter 29 in 
this volume). Consider this equation for the 
influence of x at time ft on y at the same 
time. In recognition of the possibility that addi- 
tional unmeasured variables may influence y 
we add to the conventional regression equation 
aterm a; that is constant over time. We can then 
write our regression equation as 


Vit = A+ DX, +; + Ci (27) 
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The term in alpha introduces unmeasured het- 
erogeneity into the equation. It is tantamount 
to introducing a dummy variable for every case 
into the regression equation. This term gives 
each case its own intercept. The slope, how- 
ever, is taken to be the same for all cases. We 
make the conventional assumptions about the 
e,: they are normally distributed with a mean 
of zero, and are uncorrelated with x;,. They are 
independent and identically distributed. 

If we only had cross-sectional data, we could 
not estimate equation (27) without additional 
assumptions. With panel data, however, several 
estimation strategies are possible. It is tempting 
simply to “stack” the data, so that all the obser- 
vations at time t = 1 come first, followed by 
all the observations at time 2, then all those at 
time 3, and so forth. If there are n cases and T 
waves, this would mean NT observations. 

However, because the same cases are 
observed at each wave, the observations are not 
independent. Statistically speaking, the error 
term for a given case at a given time is 


Uj, = A; + Ci (28) 


The common presence of alpha will create a 
positive correlation between the values of u at 
different times. This violates the OLS assump- 
tion that residuals are independent. 

One way to estimate the parameters that does 
not ignore this complication is to assume that 
the alphas are uncorrelated with the x,;. When 
this assumption is made, we have a random 
effects model. The coefficient b can be esti- 
mated by means of feasible generalized least 
squares, a procedure that utilizes information 
about variation between cases and over time. 
The estimates are consistent, that is to say, unbi- 
ased as the sample size grows without limit. In 
this procedure we do not estimate the individ- 
ual effects (the a;). 

An important limitation to this method is that 
it will yield biased estimates of b if the a; are in 
fact correlated with the x,,. Most of the time we 
don’t know what variables have been omitted, 


and consequently have no theoretical grounds 
for assuming that they are uncorrelated. 

An alternative approach is to estimate a fixed 
effects model. We evaluate equation (27) at 
the mean of each variable (the mean is taken 
over time) 


Vin = a+ bx, +@, +6 (29) 


The means of a and q; are, of course, just a and 
a;, respectively. Subtracting equation (29) from 
equation (27), we have 


(vi —Y) = b(X» —¥) + (Cx —G) (30) 


The constant term and the individual effects 
have all dropped out. Moreover, the observa- 
tions are independent of one another across 
time. This equation can be estimated by OLS 
regression. This estimate uses only informa- 
tion about the variation in scores over time; it 
doesn’t make use of whatever information is 
contained in the cross-sectional variation. The 
estimates of b will be consistent provided that 
strict exogeneity holds. This means that x;, is 
uncorrelated with the residuals not just at time 
t, but at all times, past and present. This require- 
ment is quite stringent; it precludes long-term 
influences of y on x. In many applications, 
this assumption is dubious. The precision with 
which the estimates of the a, are estimated does 
not increase with the sample size, but it does 
increase with the number of waves. 

An important disadvantage of the fixed effect 
model is that any time-invariant observed 
variables will drop out in the subtraction. Con- 
sequently, this method will provide no informa- 
tion about the effects of stable personality traits, 
regional subcultures and the like, even if these 
variables have been measured. This can be a 
major liability. Mover, the fixed effects estimator 
is less efficient than the random effects estima- 
tor. Consequently, one might prefer to use the 
random effects estimator, if one could feel con- 
fident that the a, were indeed uncorrelated with 
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the independent variables.? The Wu-Hausman 
test (see, e.g., Hausman, 1978) can be used to 
determine whether there are significant differ- 
ences between the estimates obtained from the 
two procedures. 

It is common when carrying out either 
method of estimation, to introduce fixed effects 
for each wave (except the first). This controls 
for trends in the outcome that are common to 
all cases. For example, in a study of American 
states, this would control for developments that 
affect all states equally. These fixed effects can 
be introduced by adding dummy variables for 
each wave but one to the model. 

These methods are illustrated by Phillips 
and Greenberg’s (2005) study of the factors 
that influence homicide rates in 400 Ameri- 
can counties during the years 1985-1999. Fixed 
effects and random effects estimators for the 
effects of 11 independent variables obtained 
using Stata version 9.1 are shown in Table 17.5. 
Note that some of the coefficients are significant 
in the random effects model but not in the fixed 
effects model. The population variables are 
significant in both models, but with opposite 
signs. These differences make it important to 
know which model to believe. A Wu-Hausman 
test prefers the fixed effects model. Evidently 
some of the effects that are significant in the 
random effects estimation are actually spurious. 

As conventionally implemented, these meth- 
ods do not furnish the overall goodness-of-fit 
test that structural equation modeling provides. 
Recently, Paul Allison (2005) has shown how 
the fixed effects model can be cast in a SEM 
framework, so that the diagnostic indicators 
that SEM provides can be used here as well. 

A complication, until recently given little 
attention in discussion of panel models, con- 
cerns stationarity. It has long been known in 
time series methodology that conventional sta- 
tistical methods break down if the data being 


31f that assumption is untenable, an instrumental vari- 
ables estimation is still possible. 


analyzed are not stationary. Stationarity holds 
when means, variances, and covariances are 
constant over time. Trends constitute one com- 
monly encountered form of nonstationarity. 
When present they can lead to spurious esti- 
mates. Nonstationarity can also occur in panel 
data. There are now tests for it, and special 
methods for handling it when it occurs. 

The basic approach just outlined can eas- 
ily be modified to accommodate lagged influ- 
ences and serially-correlated errors. If there are 
reciprocal influences, so that some of the inde- 
pendent variables are endogenous, or if one 
of the predictors is a lagged endogenous vari- 
able (a dynamic panel model), special esti- 
mation techniques (instrumental variables) are 
needed. 

A different estimation technique is also 
needed when the number of cases in the data 
set is fairly small. Though the random effects 
estimator may be consistent, its small-sample 
properties are not well understood. Simulations 
carried out by Nathaniel Beck and Jonathan 
Katz (1995) demonstrate that the standard devi- 
ations obtained in this way are much too 
small. They propose to use OLS estimates with 
standard errors corrected to reflect the panel 
structure of the data. Their panel-corrected 
standard errors approach performs much better 
in simulations. 

By writing the equations being pooled with 
an error term that is taken to be normally 
distributed, we are implicitly saying that the 
dependent variable is also continuous. Pool- 
ing models can also be used when the depen- 
dent variable is categorical, ordinal, or count. 
However, there is no fixed effect estimator for 
analyzing logits and probits from panel data, 
because there is no way to eliminate the fixed 
effects. 


4.5 Latent growth curve modeling 


Researchers studying human development have 
developed still another method for carrying 
out causal analysis with panel data — latent 
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Table 17.5 Fixed effects and random effects models of logged homicide rate 


Variable Fixed effects model Random effects model 
Coefficient Standard Coefficient Standard 
Error Error 

Intercept 0.811 1.225 
% divorced 0.023 0.013# 0.104 0.008* 
Unemployment —0.005 0.004 0.007 0.004 
Per capita income (000$) —0.004 0.007 —0.011 0.004* 
% aged 15-34 0.036 0.008* 0.010 0.004* 
% male —0.008 0.026 —0.046 0.014* 
% black 0.024 0.007* 0.038 0.002* 
Population (000,000) —0.667 0.120* 0.245 0.046* 
Population? 0.051 0.018* —0.016 0.007* 
West 0.298 0.054* 
South 0.255 0.045* 
Northeast 0.065 0.046 
R-square 

Within counties 0.086 0.070 

Between counties 0.203 0.809 

Overall 0.184 0.692 
Fraction of variance 
due to fixed effects 0.814 0.457 


Note: Both models include fixed effects for years, but the estimates are omitted from the table. 


# Significant at the .10 level 
* Significant at the .05 level 


growth curve models. This approach postulates 
a “level-one” model that characterizes the over- 
all growth pattern in the data set. It takes the 
form of a linear or quadratic equation in time, 
with a random error term: 


Vit = A; = byt +b, jt? + Cit (31) 


We make the standard assumptions about the 
residuals. What is distinctive about this equa- 
tion is that the coefficients have subscripts. 
Latent growth curve models allow the inter- 
cepts and slopes to vary from case to case. This 
extends the assumption in pooling methods that 
intercepts, but not slopes, can vary from case to 
case. 


Variables that influence the intercepts and 
slopes are specified in a “level-two” model, 
such as 


a; = Yoo + YorXit + Uri 
Dai = Vio + Var Xie + Yai (32) 
bai = Yoo + Yar Xit + Ui 


The coefficients in the level-two equations for 
the slopes (b, and b,) can be interpreted as 
interaction effects. They indicate that the time 
dependence of the growth curve varies system- 
atically (nonrandomly) across cases, or, equiva- 
lently, that the effect of x on y depends on time. 
Additional explanatory variables can be added 
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Table 17.6 Latent growth curve model of logged homicide rates 


Variable Intercept Linear term Quadratic term 
Coeff. SE Coeff. SE Coeff SE 

Intercept 1.398 0.852 0.223 0.206 0.001 0.014 
% divorced 0.108  0.013* 0.001 0.003 0.0001 0.000 
Unemployment 0.018 0.007* 0.001 0.002 —0.0002 0.000 
Per capita income (000$) —0.005 0.008 —0.003 0.002# 0.0002 0.000 
% aged 15-34 0.006 0.006 —0.001 0.001 0.0001 0.000 
% male —0.028 0.018 —0.003 0.004 —0.0001 0.000 
% black 0.028  0.002* 0.003 0.000* —0.0002 0.000* 
Population (000,000) 0.450 0.065* —0.006 0.014 —0.0008 0.001 
Population? —0.043 0.010* 0.002 0.002 0.0000 0.000 
West 0.322 0.064* —0.024 0.015 0.0018 0.001# 
South 0.521 0.054* —0.050 0.012* 0.0013 0.001 
Northeast 0.051 0.058 —0.029 0.014* 0.0013 0.001 
—2 log likelihood 4233.2 


Note: # = significant at the .10 level; 
* = significant at the .05 level. 


at will. Moreover, one can construct a “level- 
three” equation in which the coefficients of the 
“level-two” equation depend on a set of still 
other variables. For example, in studying test 
scores of schoolchildren, the level-two model 
might specify attributes of the child that influ- 
ence the level and rate of improvement in scores 
over time, while the level-three equation might 
allow these influences to depend on character- 
istics of the school or neighborhood. 

A comparison with the pooling approach 
is instructive. The latent growth curve model 
specifies a particular functional dependence on 
time, while the pooled model, by using dum- 
mies for each time, does not impose any tem- 
poral dependence on the model.* As usually 
employed, the latent growth curve model does 
not incorporate fixed effects for unmeasured 
stable attributes of cases, though this can be 
done. The pooled model does not have inter- 


4The growth curve approach can accommodate other 
types of time dependence, but this is rarely done. 


action terms between the explanatory variables 
and time — though these can be added to the 
model should one wish to do so. 

Analyzing the county homicide data in this 
way, Phillips and Greenberg (2005) obtained 
the maximum-likelihood estimates shown in 
Table 17.6 using SAS version 9.1. Compari- 
son of the two sets of estimates shows some 
similarities — e.g, percent divorced and percent 
black both have significantly positive effects 
on homicide rates in both models. Other vari- 
ables, however, are significant in one model but 
not the other. For example, unemployment fails 
to influence homicide rates significantly in the 
pooled model, but has a significantly positive 
effect in the growth curve model. These dif- 
ferences reflect differences in the assumptions 
each model makes about the processes generat- 
ing homicide rates. 

As implemented in software packages, rou- 
tines for doing this type of estimation do not 
readily accommodate models in which there are 
mutually dependent variables. However, this 
type of estimation can be done easily when the 
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multilevel model is recast as a structural equa- 
tion model (Bollen and Curran, 2005). 


5 Independent cross-sections 


Sometimes panel data are not available, but 
repeated cross-sectional data (independent 
cross-sections at more than one time) are. 
Because the same cases do not appear at 
each time, it is impossible to take differences. 
Though aggregate change can be observed, one 
doesn’t know which cases in a sample have 
changed. This poses limits to the types of causal 
analysis that can be conducted. Obviously, 
fixed effects models cannot be estimated. Nev- 
ertheless, it is possible to carry out causal anal- 
yses. The independence of observations means 
that we can legitimately stack the data and 
ignore the fact that we have multiple waves 
when we estimate our regression equations. 

A particular technique, “difference-in- 
difference”, is particularly useful for analyzing 
the effects of discrete events from independent 
cross-sections. Suppose that we have observa- 
tions for dependent variable y both before and 
after an event of some sort happens to some, 
but not all of the cases. The event could be a 
planned intervention, such as the adoption of a 
new law or policy; or it may be an unplanned, 
exogenous event such as a natural disaster. The 
variable Z represents the innovation. We code it 
0 for cases to which the event did not happen, 
and 1 for those that experienced the event. 

A naive approach would be to estimate a 
regression equation of the form 


y,;=a+bZ;+e; (33) 


using just the post-event data. Doing this we 
could estimate @=Y using only data for cases 
that did not experience the event, and a+b=y 
for the cases that did. By subtraction we obtain 
an estimate of b, which expresses the impact 
of the event. This procedure, however, fails to 
take into account differences between the cases 
that may have existed before the innovation 


was introduced. Consequently, the possibility 
of spuriousness cannot be ruled out. 

An alternative naive strategy would be to 
compare levels of yin adopting locations before 
and after the innovation was adopted. The 
estimating procedure would be essentially the 
same, but this time the same cases would be 
compared at two times. This approach would 
control for pre-existing differences among the 
locations, but would not exclude the possibility 
that the changes are due to some other cause 
than the new policy. 

The “difference-in-difference” strategy con- 
trols for both possibilities (Ashenfelter and 
Card, 1985; Abadie, 2005). Calling the time 
before any events occurred t = 0, and those after 
they occurred t = 1, we write the regression 
equation as 


Vit = Ap + Ay; + A,t + DZ, + e;, (36) 


The first coefficient represents a contribution to 
the outcome that is the same for all places at 
both times. The second represents factors that 
differ from one case to another but are constant 
over time. They might, on average, be different 
in places that fail to adopt a new policy, than 
in places that do, or in families that undergo 
a divorce, and in those that do not. The third 
coefficient represents trends that are common 
to all places. The coefficient b continues to rep- 
resent the effect of the event. 

Taking the mean of this equation at each time 
and evaluating it when tf = 0 and 1, and when 
Z = 0 and 1, we obtain four equations that can 
be solved for the four parameter estimates. The 
estimate for b is 


b=[y(t=1,Z =1)-y(t =0,Z =1)] 
~[y(t=1,2=0)-y(t=0,Z=0)] (37) 


This is the “difference in difference”. It is the 
difference between the change in the dependent 
variable for cases that experienced the event, 
and the change for cases that did not. It can 
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also be expressed as the difference in outcomes 
between the two sets of cases after the events 
occurred, less the difference in outcomes before 
any events occurred. 

The method can be broadened to include 
additional explanatory variables, and to cir- 
cumstances in which the innovation variable is 
not dichotomous, but the level of a quantita- 
tive variable. In the form presented here, the 
method implicitly assumes that in the absence 
of any exogenous events, the two sets of cases 
would undergo the same temporal change in the 
dependent variable. With more than two waves, 
each set of cases can be allowed to have its own 
inherent growth rate (Abadie, 2005). 

In some contexts, the implications of an event 
should be different for different groups of peo- 
ple. For example, civil rights laws barring racial 
discrimination in employment should have dif- 
ferent impacts on employment for white and 
black workers. To study the impact of state 
laws barring racial discrimination prior to the 
adoption of a federal civil rights law, Collins 
(2003) looked for such differences using a 
“difference-in-difference-in difference” analy- 
sis. His analysis first looked at before-and- 
after changes in employment and wages for 
states that adopted laws barring racial discrim- 
ination as compared with states that did not. 
These differences were computed separately for 
white and black workers, and the differences 
differenced. 


6 Software 


The types of analysis discussed under Section 
3 Qualitative Outcomes can be done with virtu- 
ally any statistical software package. This is not 
true for the analyses described under Quantita- 
tive Outcomes. Pooled methods are most easily 
implemented with packages designed for this 
purpose, e.g., Stata, SAS, Limdep and Eviews. 
Structural equation modeling can be carried 
out with such specialized programs as LISREL, 
Amos, EQS and Mplus. It can also be done in 
SAS, using PROC CALIS. Multi-level modeling 


can be carried out in special stand-alone pro- 
grams such as HLM and MlwiN, and in some 
general statistical packages, including SPSS, 
Stata and SAS. 
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| Chapter 18 i 


Causal inference in longitudinal 
experimental research 
Jos W. R. Twisk 


1 Introduction 


Randomized controlled trials (RCTs) are con- 
sidered to be the gold standard for evaluating 
the effect of a certain intervention (Rothman 
and Greenland, 1998). In a randomized con- 
trolled trial, the population under study is ran- 
domly divided into an intervention group and 
a nonintervention, or reference, group (e.g., a 
placebo group or a group with “usual” care). 
When discussing the analysis of causal infer- 
ence in experimental research a distinction 
must be made between studies with only one 
follow-up measurement and studies with more 
than one follow-up measurement. When there 
is only one follow-up measurement relatively 
simple statistical techniques can be used to 
evaluate the effect of the intervention, while 
when more than one follow-up measurement is 
considered, in general, more sophisticated sta- 
tistical techniques are necessary. 


2 Experimental research with only 
one follow-up measurement 


In most RCTs, besides the follow-up measure- 
ment a baseline measurement is also performed. 
With the information gathered in this base- 
line measurement, it is possible to compare the 
changes in the outcome variable between the 


intervention and the reference group. Although 
this procedure looks quite straightforward, the 
definition of change can be complicated. In 
fact, since the beginning of the 1960s, there 
has been an ongoing debate how to define 
“change” (Bereiter, 1963; Cronbach and Furby, 
1970; Plewis, 1985; Gottman, 1995). When eval- 
uating the RCT literature, in most research sit- 
uations the absolute change between a baseline 
measurement and a follow-up measurement is 
calculated, and this absolute change in a cer- 
tain outcome variable is compared between the 
groups of interest. However, there are many 
ways to define changes between a baseline and 
a follow-up measurement (Twisk, 2003), and 
therefore there are many ways to evaluate the 
results of a (randomized controlled) trial. 


2.1 Changes between baseline and 
follow-up: continuous outcome variables 


As mentioned before, the simplest method is to 
calculate the absolute difference between two 
measurements over time [equation (1)]. 


AY = Yi — Yin (1) 


where: Y;,.=observations for subject i at time 
12; and Y,,, = observations for subject i at time f1. 

One of the typical problems related to the use 
of absolute change in RCTs to evaluate the effect 
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of an intervention, is the phenomenon of regres- 
sion to the mean. If the outcome variable at t=1 
is a sample of random numbers, and the out- 
come variable at t= 2 is also a sample of random 
numbers, then the subjects in the upper part of 
the distribution at t=1 are less likely to be in 
the upper part of the distribution at t=2, com- 
pared to the other subjects. In the same way, the 
subjects in the lower part of the distribution at 
t=1 are less likely than the other subjects to be 
in the lower part of the distribution at t= 2. The 
consequence of this is that, just by chance, the 
change between t=1 and ft=2 is correlated with 
the initial value. Another consequence is that 
when the intervention group and control group 
differ at baseline, a comparison of the absolute 
changes between the groups can lead to either 
an overestimation or an underestimation of the 
intervention effect (Twisk and Proper, 2004). 
There are, however, some ways in which it is 
possibile to define changes between subsequent 
measurements, more or less “correcting” for the 
phenomenon of regression to the mean. One of 
these possibilities is the use of a relative dif- 
ference between two measurements over time 
instead of the absolute difference [equation (2)] 


a (Yi = Yin) 


AY 
Yin 


x 100% (2) 


where: Y;,. = observations for subject i at time 
t2; and Y;,,, = observations for subject i at 
time f1. 

Although it has been suggested that the use 
of relative change corrects for the phenomenon 
of regression to the mean, this is not the case. 
Figure 18.1 illustrates this artefact. 

Figure 18.1 shows the development over time 
in a certain continuous outcome variable for 
an intervention and a control group. In the 
first part of the figure, the intervention group 
decreases from 4 to 3 (i.e., a 25% decrease), 
while the control group decreases from 3 to 2 
(i.e., a 33% decrease). Using the relative change 
in this situation will more or less correct for the 
phenomenon of regression to the mean, because 


m Intervention 
e Control 


Va 
3 
2 33% 50% 2. 


Figure 18.1 Relative change does not always 
correct for the phenomenon of regression to the 
mean 


the change in the control group is “more dif- 
ficult” and therefore an equal absolute change 
will be evaluated in favor of the control group. 
However, when the outcome variable increases 
over time, which is illustrated in the second 
part of Figure 18.1. The intervention group 
increases from 3 to 4 (i.e., a 33% increase), 
while the control group increases from 2 to 3 
(i.e., a 50% increase). Using the relative change 
in this situation will not correct for the phe- 
nomenon of regression to the mean. In fact, it 
will exacerbate the problem. Another approach 
in which to define changes correcting for regres- 
sion to the mean is known as “analysis of 
covariance” [equation (3)]. With this technique 
the value of the outcome variable Y at the sec- 
ond measurement is used as the outcome vari- 
able in a linear regression analysis, with the 
observation of the outcome variable Y at the 
first measurement as one of the predictor vari- 
ables (i.e., as a covariate): 


Yin = PB o+ Bi Yin + B,X;+ seeeeee (3) 


where: Y;,,=observations for subject i at 
time 12; 6,=regression coefficient for Yj; 
Y,, = observations for subject i at time f1; 
B,=regression coefficient for X;; and 


X; = intervention variable. 
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In the analysis of covariance, the change 
is defined relative to the value of Y at t=1. 
This relativity is expressed in the regression 
coefficient 6, and, therefore, it is assumed 
that this method ‘corrects’ for the phenomenon 
of regression to the mean. In fact the effect 
of the intervention is evaluated assuming the 
same baseline value for both groups. Some 
researchers argue that the best way to define 
changes, correcting for the phenomenon of 
regression to the mean, is a combination of 
equations (1) and (3). They suggest calculating 
the absolute change between Y;, and Y,,, cor- 
recting for the value of Y;, (equation 4). 


Yio — Yitr = Bo + Bi Yin + BoXi t+. (4) 


where: Y,,. = observations for subject i at time 
12; Y;,, = observations for subject i at time 11; 
8, = regression coefficient for Y;,,; 8B, = regres- 
sion coefficient for X;; and X; = intervention 
variable. 

However, analyzing the change, correcting for 
the initial value at t=1, is exactly the same as 
the analysis of covariance described in equa- 
tion (3). This can be seen when equation (4) is 
written in another way [equation (5)]. The only 
difference between the models is that the regres- 
sion coefficient for the initial value is different; 
i.e., the difference between the regression coef- 
ficients for the initial value is equal to 1. 


Yio = Bo + (Bi +1) Yin + BoXj +. (5) 


where: Y,,, = observations for subject i at time 
12; B, = regression coefficient for Yj.; Yj. = 
observations for subject i at time ¢1; B, = regres- 
sion coefficient for X;; and X; = intervention 
variable. 

All techniques discussed so far are suitable 
in situations in which the continuous outcome 
variable theoretically ranges from 0 to +00, or 
from —oo to 0, or from —oo to +oo. Some vari- 
ables (e.g., scores on questionnaires) have maxi- 
mal possible values (“ceilings”) and/or minimal 


possible values (“floors”). To take these “ceil- 
ings” and/or “floors” into account, the defini- 
tion of change can be as shown in equation (6). 


(Tire 7 Yin) 


hen Y, Yy2AY = 1009 
seni al ee ty eae Yin) : ‘ 
(6a) 
Y...—Y. 
when Y;. < Yr: AY = (Yin = Vin) x 100% 
(Yin ~~ Yin) 
(6b) 
when Yj. = Yx,: AY =0 (6c) 


where: Y;,. = observations for subject i at time 
t2;Y,, = observations for subject i at time 
t1; Yinax = Maximal possible value of Y (“ceil- 


ing”); and Y,,,,, = minimal possible value of Y 
(“floor”). 


2.2 Changes between baseline and 
follow-up: dichotomous (and categorical) 
outcome variables 


For dichotomous outcome variables the sit- 
uation is slightly more difficult than was 
described for continuous outcome variables. 
This is due to the fact that a change in a 
dichotomous outcome variable between subse- 
quent measurements leads to a categorical vari- 
able. First of all, there are subjects who stay in 
the “highest” category, there are subjects who 
stay in the “lowest” category, and there are sub- 
jects who move from one category to another 
(see Figure 18.2). 

In general, for categorical outcome variables 
with C categories, the change between sub- 
sequent measurements is another categorical 
variable with C* categories. The cross-sectional 
analysis of the resulting categorical variable can 
be performed with polytomous/multinomial 
logistic regression analysis, which is now avail- 
able in most software packages. 
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Figure 18.2 Changes in a dichotomous outcome 
variable over time results in a categorical outcome 
variable 


Unfortunately, polytomous logistic regres- 
sion analysis is not much used. Therefore, in 
many studies the resulting categorical outcome 
variable is reduced to a dichotomous outcome 
variable, which can be analyzed with simple 
logistic regression analysis. One widely used 
possibility is to discriminate between subjects 
who showed an “increase” and subjects who 
did not, etc. Nevertheless, in every reduction 
information is lost, and it is obvious that a 
dichotomization is not recommended in most 
research situations. 

Another method with which to analyze 
changes in a dichotomous outcome variable is 
analysis of covariance. Instead of a linear regres- 
sion analysis, logistic regression analysis must 
be used for the dichotomous outcome variable. 

For dichotomous outcome variables, the 
interpretation of the results of a logistic analysis 
of covariance is, however, complicated, espe- 
cially when all four possible changes over time 
occur in the dataset. (For further detail on the 
use of logistic regression to analyze changes 
in dichotomous and categorical outcomes, see 
the chapters by Menard in Section VI of this 
volume.) 


2.3 Changes between baseline and 
follow-up: ordinal outcome variables 


The definition of change between baseline and 
follow-up is even more complicated when ordi- 
nal outcome variables are considered. Ordinal 
scales are characterized not only by an ordering 


of categories but also by the fact that the dis- 
tance from category to category is not known. 
In other words, on ordinal scales, the difference 
between 2 and 3 cannot be assumed the same 
as the difference between 3 and 4 (Stucki et al., 
1996). This implies that analyses of changes in 
ordinal outcome variables are only valid when 
there are no differences at baseline. When there 
are differences at baseline, none of the earlier 
presented methods leads to valid results. When 
an ordinal outcome variable is based on a sum 
score of several items of a questionnaire (which 
is common in health-status measures), Rasch 
analysis (Raczek et al., 1998; MacKnight and 
Rockwood, 2000) can be seen as a sort of recal- 
ibration of the ordinal scale into an interval 
scale. In general, it is suggested to use nonpara- 
metric statistics to analyze changes in ordinal 
outcome variables (Schnell et al., 1995; Sonn 
and Svensson, 1997; Svensson, 1998), or to use 
complicated statistical modeling (Agresti, 1989; 
1999). However, whatever the statistical tech- 
nique used to analyze changes in ordinal out- 
come variables, the results of these analyses 
should be interpreted cautiously. 


2.4 Recommendation 


Although in most studies the absolute change 
between a baseline measurement and a follow- 
up measurement is used to evaluate the effect 
of certain intervention, in many situations this 
is not the most appropriate method, first of 
all because of its assumed negative correlation 
with the initial value (i.e., the phenomenon of 
regression to the mean), and second because of 
its low reliability. For more information on the 
latter, reference is made to Rogosaa (1995) who 
gives an interesting overview of the “myths and 
methods” in longitudinal research and, in par- 
ticular, the definition of change. 

It is difficult to give straightforward advice 
regarding the definition of change that should 
be used in an RCT with two measurements. The 
choice for a particular method depends on the 
characteristics of the outcome variable. When 
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a continuous outcome variable is involved and 
there are no anticipated ceiling or floor effects, 
analysis of covariance is recommended because 
the technique corrects (if necessary) for the 
phenomenon of regression to the mean. When 
there are anticipated ceiling or floor effects, 
they should be taken into account. However, 
the best way of analyzing change is (probably) 
a combination of the results of several ana- 
lyses obtained from various (biologically plau- 
sible) definitions of change. When changes in 
a dichotomous outcome variable are analyzed, 
polytomous logistic regression analysis of the 
categorical variable is preferable to (logistic) 
analysis of covariance, because it provides more 
information and the interpretation of the results 
is fairly straightforward. 


3 Experimental research with more 
than one follow-up measurement 


In the past decade, experimental studies with 
only one follow-up measurement have become 
very rare. At least one short-term follow- 
up measurement and one long-term follow-up 
measurement “must” be performed. However, 
more than two follow-up measurements are 
usually performed in order to investigate the 
“development” of the outcome variable, and 
to compare the “developments” among the 
groups. These more complicated experimen- 
tal designs are often analyzed with the sim- 
ple methods that have already been described 
in the earlier paragraphs, mostly by analyzing 
the outcome at each follow-up measurement 
separately, or sometimes even by ignoring the 
information gathered from the in-between mea- 
surements, i.e., only using the last measurement 
as an outcome variable to evaluate the effect of 
the intervention. Besides this, summary statis- 
tics are often used (see Section 3.1). This is 
surprising, because there are statistical methods 
available that can be used to analyze the differ- 
ence in “development” of the outcome variable 
in two or more groups. 


3.1 Continuous outcome variables 


Summary statistics 
There are many summary statistics available 
with which to estimate the effect of an inter- 
vention in an experimental study. In fact, the 
relatively simple analyses carried out in Section 
2 can also be considered as summary statis- 
tics. Depending on the research question to be 
addressed and the characteristics of the out- 
come variable, different summary statistics can 
be used. The general idea of a summary statistic 
is to express the longitudinal development of 
a particular outcome variable as one quantity. 
Therefore, the complicated longitudinal prob- 
lem is reduced to a cross-sectional problem. To 
evaluate the effect of the intervention, the sum- 
mary statistics of the groups under study are 
compared to each other. Table 18.1 gives a few 
examples of summary statistics. 

One of the most frequently used summary 
statistics is the area under the curve [AUC; 
equation (7)]. 


1 T-1 
AUC = 2 > (tis = t,) (Y;+ Yi41) (7) 
t=1 


where: AUC = area under the curve; T=number 
of measurements; and Y = observation of the 
outcome variable at time =t. 

When the AUC is used as a summary statistic, 
the AUC must first be calculated for each sub- 
ject, which is then used as an outcome variable 
to evaluate the effect of the therapy under study. 
This comparison is simple to carry out with an 


Table 18.1 Examples of summary statistics, which 
are frequently used in experimental studies 


The mean of all follow-up measurements 

The highest (or lowest) value during follow-up 
The time needed to reach the highest value or 
a certain pre-defined level 

™ Changes between baseline and follow-up levels 
The area under the curve 
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independent t-test. When the time-intervals are 
equally spaced, the AUC is exactly the same as 
the overall mean. The AUC becomes interest- 
ing when the time-intervals in the longitudinal 
study are unequally spaced, because then the 
AUC reflects the “weighted” average in a cer- 
tain outcome variable over the total follow-up 
period. 


(M)ANOVA for repeated measurements 

The basic idea behind (multivariate) analysis 
of variance ((M)ANOVA) for repeated measure- 
ments (which is also known as a “general lin- 
ear model for repeated measurements”) is the 
same as for the well-known paired t-test. The 
statistical test is carried out for the T-1 absolute 
differences between subsequent measurements. 
In fact, (M)ANOVA for repeated measurements 
is a multivariate analysis of these T-1 absolute 
differences between subsequent time-points. 
Multivariate refers to fact that T-1 differences 
are used simultaneously as outcome variables. 
Besides the “multivariate” approach, the same 
research question can also be answered with a 
“univariate” approach. This “univariate” pro- 
cedure is comparable to the procedures carried 
out in simple analysis of variance (ANOVA) 
and is based on the “sum of squares”, i.e., 
squared differences between observed values 
and average values. In most software packages, 
the results of both the “multivariate” and “uni- 
variate” approaches are provided at the same 
time. From a (M)ANOVA for repeated measure- 
ments with one dichotomous determinant (i.e., 
intervention versus control group), basically 
three “effects” can be derived (Twisk, 2003). 
An overall time-effect (i.e., is there a change 
over time, independent of the different groups), 
an overall group effect (i.e., is there a differ- 
ence between the groups independent of time) 
and, most important, a group-time interaction 
effect (i.e., is there a difference between the 
groups in development over time). Although 
(M)ANOVA for repeated measurements is very 
often used, it has a few major drawbacks. First 


of all, it can only be applied to complete 
cases; all subjects with one or more missing 
observation are not part of the analyses. Sec- 
ond, (M)ANOVA for repeated measurements is 
mainly based on statistical significance testing, 
while there is more interest in effect estimation. 
Because of this, nowadays sophisticated statis- 
tical techniques, such as generalized estimating 
equations (GEE) or random coefficient analy- 
sis (see below), are becoming more and more 
popular. 


(M)ANOVA for repeated measurements 
corrected for the baseline value 

When the baseline values are different in the 
groups to be compared, it is often suggested 
that a (MJ)ANOVA for repeated measurements 
should be performed, correcting for the base- 
line value of the outcome variable. It should 
be noted carefully that when this procedure 
(which is also known as (multivariate) analysis 
of covariance, i.e., (M)ANCOVA) is performed 
the baseline value is both an outcome vari- 
able (i.e., to create the difference between the 
baseline value and the first follow-up measure- 
ment) and a covariate. In some software pack- 
ages (such as SPSS) this is not possible, and 
therefore an exact copy of the baseline value 
must be added to the model. 


Sophisticated analysis 

The questions answered by (M)ANOVA for 
repeated measurements could also be answered 
by sophisticated methods, such as generalized 
estimating equations (GEE) or random coeffi- 
cient analysis/multilevel analysis. The advan- 
tage of the sophisticated methods is that all 
available data is included in the analysis, while 
with (M)ANOVA for repeated measurements 
(and therefore also with (M)ANCOVA) only 
those subjects with complete data are included. 
Another important advantage of the sophisti- 
cated analyses is that they are regression tech- 
niques, from which the effect estimates (i.e., the 
magnitude of the effect of the intervention) and 
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the corresponding confidence intervals can be 
derived easily. 

The general idea behind all statistical tech- 
niques to analyze longitudinal data is that 
because of the dependency of observations 
within a subject a correction must be made 
for “subject”. The problem, however, is that 
the variable “subject” is a categorical variable 
that must be represented by dummy variables. 
Suppose there are 200 subjects in a particu- 
lar study. This means that 199 dummy vari- 
ables are needed to correct for subject. Because 
this is practically impossible, the correction for 
“subject” has to be done in a different way 
and the different longitudinal techniques differ 
from each other in the way they perform that 
correction. 

Within GEE, the correction for the depen- 
dency of observations is done by assuming (a 
priori) a certain “working” correlation structure 
for the repeated measurements of the outcome 
variable (Zeger and Liang, 1986; Liang and 
Zeger, 1986). Depending on the software pack- 
age used to estimate the regression coefficients, 
different correlation structures are available. 
They basically vary form an “exchangeable” 
(or “compound symmetry”) correlation struc- 
ture, i.e., the correlations between subsequent 
measurements are assumed to be the same, 
irrespective of the length of the measurement 
interval, to an “unstructured correlation struc- 
ture”. In this structure no particular structure 
is assumed, which means that all possible cor- 
relations between repeated measurements have 
to be estimated. 

In the literature it is assumed that GEE 
analysis is robust against a wrong choice for 
a correlation structure, i.e., it does not mat- 
ter which correlation structure is chosen, the 
results of the longitudinal analysis will be more 
or less the same (Liang and Zeger, 1993; Twisk, 
2004). However, when the results of analysis 
with different working correlation structures 
are compared to each other, the magnitude of 
the regression coefficients are different (Twisk, 


2003). It is therefore important to realize which 
correlation structure should be chosen for the 
analysis. Although the unstructured working 
correlation structure is always the best, the sim- 
plicity of the correlation structure also has to 
be taken into account. The number of parame- 
ters (in this case correlation coefficients) which 
needs to be estimated differs for the various 
working correlation structures. In the exam- 
ple dataset with six repeated measurements, 
for instance, for an exchangeable structure only 
one correlation coefficient has to be estimated, 
while for the unstructured correlation structure, 
15 correlation coefficients must be estimated. 
As a result, the power of the statistical anal- 
ysis is influenced by the choice for a certain 
structure. The best option is therefore to choose 
the simplest structure which fits the data well. 
The first step in choosing a certain correla- 
tion structure can be to investigate the within- 
person correlation coefficients for the outcome 
variable. It should be kept in mind that when 
analyzing covariates, the correlation structure 
can change (i.e., the choice of the correlation 
structure should better be based conditionally 
on the covariates). 

Random coefficient analysis (Laird and Ware, 
1982) is also known as multilevel analysis 
(Goldstein, 2003; Twisk, 2006), hierarchical 
modeling, or mixed effects modeling. As has 
been mentioned before, the general idea behind 
all longitudinal statistical techniques is to cor- 
rect for “subject” in an efficient way. Correcting 
for “subject” actually means that for all subjects 
in the longitudinal study, different intercepts 
are estimated. The basic principle behind the 
use of random coefficient analysis in longitudi- 
nal studies is that not all separate intercepts are 
estimated, but that (only one) variance of those 
intercepts is estimated, i.e., a random inter- 
cept. It is also possible that not only the inter- 
cept is different for each subject, but that also 
the development over time is different for each 
subject, in other words, there is an interaction 
between “subject” and time. In this situation the 
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variance of the regression coefficients for time 
can be estimated, i.e., a random slope for time. 
In fact, these kind of individual interactions 
can be added to the regression model for all 
covariates. In a regular RCT, however, assum- 
ing a random slope for the intervention effect 
is not possible, because the intervention vari- 
able is time-independent (Twisk, 2006). When 
a certain subject is assigned to either the inter- 
vention or control group, that subject stays in 
that group along the intervention period. An 
exception is the crossover trial, in which the 
subject is his own control and the interven- 
tion variable is time-dependent. In this situa- 
tion the intervention effect can be different for 
each subject and therefore a random slope for 
the intervention variable can be assumed. For 
random coefficient analysis, one has to choose 
which coefficients have to be assumed random. 
This choice is easier than the choice for a work- 
ing correlation structure in GEE analysis. This 
is due to the fact that most standard software, 
which can be used for random coefficient anal- 
ysis, provides a likelihood ratio chi-square test 
using —2 log likelihood values of each model, 
which can be used to evaluate different models. 


Comparison between GEE analysis and 
random coefficient analysis 

Both GEE and random coefficient analyses 
are highly suitable to analyze longitudinal 
experimental data, because in both methods a 
correction is made for the dependency of the 
observations within one subject. The question 
then arises: Which of the two methods is better? 
Unfortunately, no clear answer can be given. 
For continuous outcome variables, GEE analy- 
sis with an exchangeable correlation structure 
is the same as a random coefficient analysis 
with only a random intercept. The correction 
for the dependency of observations with an 
exchangeable “working correlation” structure is 
the same as allowing individuals to have ran- 
dom intercepts. When the dependency of obser- 
vations is slightly more complicated, GEE anal- 
ysis with a different correlation structure can 


be used or random coefficient analysis with 
additional random regression coefficients for 
other variables (e.g., time). Although random 
coefficient analysis is slightly more flexible, it 
should be realized that “regular” random coef- 
ficient analysis is limited by the fact that the 
random regression coefficients are assumed to 
be normally distributed, i.e., the variance of the 
intercepts is estimated by assuming a normal 
distribution. 


Correcting for baseline values in 
sophisticated analyses? 

It was already mentioned before that when base- 
line values differ between the intervention and 
control group it is (sometimes) necessary to 
correct for these differences, i.e., to correct for 
the phenomenon of regression to the mean. So, 
when more than one follow-up measurement 
is considered, this means that the whole lon- 
gitudinal development of the outcome variable 
over time is corrected for the baseline value (see 
equation 8). 


Yit = Bo + B1 Xj + Bo Vito ++ (8) 


where: Y;, = observations for subject i at time ¢; 
Bo = intercept; B, = regression coefficient for X;; 
X; = intervention variable; 6, = regression coef- 
ficient for observation at t0; and Y;,, = observa- 
tion for subject i at time fp. 

The other possibility is to use a so-called 
autoregressive model in which the whole devel- 
opment of the outcome variable is not cor- 
rected for the baseline value, but each mea- 
surement of the outcome variable for the value 
of the outcome variable one time-point earlier 
[see equation (9)] 


Yip = Bo + By X + By Vy_y to (9) 


where: Y ,, =observations for subject i at time f; 
By = intercept; 6, = regression coefficient for X;; 
X; = intervention variable; Y;, = observation 
for subject i at time t-1; and B, = regression 
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coefficient for observation at f-1 (autoregression 
coefficient). 

The idea underlying the autoregressive model 
is that the value of an outcome variable at 
each time-point is primarily influenced by the 
value of this variable one measurement earlier. 
To estimate the “real” influence of the inter- 
vention variable on the outcome variable, the 
model should therefore correct for the value 
of the outcome variable at time-point f-1. In 
fact, with an autoregressive model, the “cor- 
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rected” changes between subsequent measure- 
ments are compared between the intervention 
and the control group. 

Although the longitudinal analysis of covari- 
ance is mostly used, it is questionable whether 
or not this is correct. In fact, the correction for 
baseline overestimates the therapy effect. When 
there are, for instance, two follow-up measure- 
ments, the short-term effect is doubled in the 
estimation of the overall therapy effect. This sit- 
uation is illustrated in Figure 18.3. As can be 
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Figure 18.3 The difference between two approaches that can be used in the analysis of an experimental 
longitudinal study. The effects a1 and a2 are detected by autoregression analysis (equation 9), while the effects 
b1 and b2 are detected by the longitudinal analysis, correcting for the baseline value (equation 8). For Figure 
18.3a, the two methods will show comparable results (a1= b1 and a2 = b2). For Figure 18.3b, the longitudinal 
analysis, correcting for baseline, will detect a stronger decline than the autoregression analysis (a1 = b1 and 
a2 < b2). The situation in Figure 18.3c will produce the same result as in Figure 18.3b (i.e., a1 = b1 and 


a2 < b2). 
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seen from this figure, this overestimation is not 
present when an autoregressive model is used. 


3.2. Dichotomous outcome variables 


In many experimental studies, the outcome 
variable of interest is not a continuous one, but 
a dichotomous one. A subject is recovered or 
not recovered, a subject experiences a relapse 
of a certain chronic disease or not, etc. When 
there is only one follow-up measurement, the 
analysis of a dichotomous outcome variable is 
not very complicated and can be performed 
with either logistic regression analysis or sur- 
vival analysis. This has to do with the fact that 
mostly at baseline all subjects are the same; 
i.e., they are all patients with a certain disease 
the particular intervention is aiming at. The 
biggest problems occur when the dichotomous 
outcome variable can change over time, i.e., 
when the event of interest is recurrent. Figure 
18.4 shows an example of a study in which 
the event of interest (i.e., treatment success) is 
recurrent. 

Basically, the different statistical techniques 
to analyze recurrent event data can be divided 
into “naive” techniques and longitudinal tech- 
niques. “Naive” techniques are characterized by 
either ignoring the existence of recurrent events 
or ignoring the fact that the recurrent events 
within subjects are correlated. Longitudinal 
techniques on the other hand are characterized 


100 
s - @- Control 
. » 80- = Intervention 
- 
3 ® 
28 
re} 4 
3 3 60 
5 ¢ 
ce 40 
- SB 
a) 
B= 204 
2 
o 
0 T bi T T T T T T T T T T 
012 3 4 5 


Follow-up measurement 


Figure 18.4 An experimental study with recurrent 
events 


by the fact that the whole pattern of recur- 
rent events over time is analyzed, taking into 
account that the recurrent events are correlated 
within subjects. Despite the fact that there are 
many statistical techniques available to ana- 
lyze recurrent event data (Eisen, 1999), for most 
researchers it is rather difficult to choose the 
proper technique to answer the research ques- 
tion they are interested in. Reviewing the liter- 
ature, it is rather surprising that most authors 
use “naive” statistical techniques to analyze 
their study outcomes (Stiirmer et al., 2000). The 
mostly used “naive” statistical techniques are 
“naive” in such a way that they do not use 
all available data, but only one observation for 
each patient. First, a logistic regression analysis 
can be performed in order to analyze the differ- 
ence in the proportion of patients with “treat- 
ment success” at the end of the study. Second, 
a survival analysis (i.e., Cox proportional haz- 
ards regression) can be performed with the first 
experienced event (i.e., “treatment success”) 
and the time to that event as an outcome vari- 
able. A possible explanation for the popularity 
of using “naive” techniques is that most longitu- 
dinal techniques are only described in specific 
statistical literature, which is difficult to under- 
stand for most (non-mathematical) researchers 
(Clayton, 1994; Lagakos, 1997). However, the 
general ideas behind these techniques are not 
as difficult as often suggested. 


Longitudinal techniques 

The longitudinal techniques to analyze 
recurrent events can be divided into sur- 
vival approaches and (longitudinal) logistic 
regression approaches. Regarding survival 
approaches, Cox proportional hazards regres- 
sion for recurrent events can be performed. 
Although there are different estimation proce- 
dures available (Kelly and Lim, 2003) the general 
idea behind Cox proportional hazards regression 
for recurrent events is that the different time 
periods are analyzed separately adjusted for the 
fact that the time periods within one patient 
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are dependent. The idea of this adjustment 
is that the standard error of the regression 
coefficient of interest is increased proportional 
to the correlation of the observations within 
one patient. One of the problems using Cox 
proportional hazards regression for recurrent 
events is the question how to define the “time 
at risk”. Especially when the events under 
study are not short-lasting events, but can be 
long-lasting, i.e., continuing over more than 
one time-point (i.e., they can be considered as 
“states”). Figure 18.5 shows a few possibilities 
to define the “time at risk” in a study where the 
dichotomous outcome variable is recurrent:: (1) 
the “counting approach”’ Each time period is 
analyzed separately assuming that all subjects 
are at risk at the beginning of each period, 
irrespective of the situation at the end of the 
foregoing period; (2) the “total time” approach. 
This is comparable to the “counting” approach. 
However, in the “total time” approach, the 
starting point for each period is the beginning of 
the study; (3) the “time to event” approach. For 
instance, when the event of interest is treatment 
success, in this approach only the transitions 
from “no treatment success” to “treatment suc- 
cess” are taken into account. So, if for a subject 
(or patient) the treatment was “successful” at 
the first follow-up and stays “successful” at all 
repeated measurements, only the first time is 
taken into account in the analysis. When for 
another subject (or patient) the treatment was 
“successful” at the first follow-up, and “not suc- 
cessful” at the second follow-up, that particular 
subject (or patient) is again at risk from the first 
follow-up onwards until the treatment for that 
subject (or patient) is “successful” for the second 
time, or until the follow-up period ended. 
Regarding the logistic regression approaches, 
the two techniques that have already been dis- 
cussed for continuous outcome variables can 
also be used to analyze the recurrent event data, 
i.e., GEE analysis and random coefficient anal- 
ysis (see page 284). When dichotomous out- 
come variables are considered, a logistic version 
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Figure 18.5 Possible definitions of the time at risk 
to be analyzed with Cox regression for recurrent 
events for a subject in a hypothetical study who 
experienced an event at time12, relapsed at time24, 
experienced an event again at time36 which 
continued till the end of the study 
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of both GEE analysis and random coefficient 
analysis is available. It should be noted that for 
dichotomous outcome variables, (M)ANOVA 
for repeated measurements can not be used. 
In contrast to the analysis with a continuous 
outcome variable, the two longitudinal logis- 
tic regression techniques (i.e., GEE analysis 
and random coefficient analysis) lead to dif- 
ferent results. Basically, both “longitudinal” 
techniques take all measurements into account, 
and use a logistic regression approach with 
a correction for the dependency of the obser- 
vations. Again either by assuming a certain 
“working” correlation structure (GEE analysis) 
or by allowing random regression coefficients 


Random coefficient 
analysis 


GEE analysis 


(a) Arbitrary value 
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(random coefficient analysis). Because a logis- 
tic regression approach is used, the regression 
coefficients derived from both techniques can 
be transformed into odds ratios. The difference 
between the two techniques is that GEE analy- 
sis is a so-called population average approach, 
while random coefficient analysis is a so-called 
subject specific approach (Twisk, 2003). The 
different estimation procedures cause the dif- 
ference in the magnitude of the odds ratios, 
which is always in favor of the random coeffi- 
cient analysis, i.e., the “effects” estimated with 
random coefficient analysis are always bigger 
than the “effects” estimated with GEE analysis. 
(see Figure 18.6). Because the standard errors 
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Figure 18.6 The “population average” approach of GEE analysis and the “subject specific” approach of 
random coefficient analysis, illustrating both the situation with a continuous outcome variable (Figure 18.6a 
and b), and the situation with a dichotomous outcome variable (Figure 18.6c and d). 
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are also bigger for the random coefficient ana- 
lysis (and therefore the 95% confidence inter- 
vals wider), the corresponding p-values are 
not much different, and when conclusions are 
based on these p-values, they will not be much 
different when using GEE analysis or random 
coefficient analysis. However, when the conclu- 
sions are based on the magnitude of the odds 
ratios, the conclusions will differ remarkably 
between the two techniques. 

It should further be noted that the estimations 
of the regression coefficients (i.e., odds ratios) 
with random coefficient analyses of recurrent 
events can be very complicated and often lead 
to unstable results. Furthermore, the results 
of these analyses can differ between software 
packages (Yang and Goldstein, 2000; Lesaffre 
and Spiessens, 2001; Twisk, 2003). 


Comments 

In the previous section an overview was given 
of different techniques that can be used for 
the analysis of recurrent event data. The sim- 
plest and probably most illustrative way of 
describing recurrent event data is plotting the 
proportion of subjects experiencing the event 
at each time-point (see Figure 18.4) or show- 
ing the different response patterns observed. 
Although both can give a nice overview, it is 
difficult to analyze these patterns. So, therefore, 
several statistical analyses can be performed 
on the recurrent event data. It is striking that 
the techniques that use all available data can 
give totally different results than the techniques 
that use only part of the data (Twisk et al., 
2005). The choice for a particular technique 
highly depends on the research question. With 
the logistic regression analysis using the data 
assessed at the end of the follow-up period the 
long-term effect of the intervention is analyzed, 
while with the “naive” Cox regression analy- 
sis the short-term effect of the intervention is 
analyzed. With the longitudinal techniques that 
use all available data, the overall intervention 
effect, i.e., the whole development over time, 


is analyzed. Note, too, that Cox regression anal- 
ysis outputs hazard ratios, which are different 
from and cannot be used interchangeably with 
the odds ratios output in logistic regression 
analysis. 

A major problem of using Cox regression 
for recurrent events on the other hand is the 
assumption of proportional hazards over time; 
an assumption that does not hold in many situ- 
ations. When the proportional hazards assump- 
tion does not hold, it is possible to divide 
the follow-up period into several subperiods 
and calculate different hazard ratios for each 
subperiod. Furthermore, compared to the lon- 
gitudinal logistic regression approaches the 
possibilities to correct for the dependency of 
observations in using Cox regression are rather 
limited. In fact the correction only influences 
the standard error of the regression coefficient, 
i.e., the width of the 95% confidence interval 
around the hazard ratio. The point estimate is 
equal to the point estimate derived from an 
analysis when the observations are considered 
to be independent. 

All statistical techniques discussed in light 
of the analysis of recurrent events are either an 
extension of Cox proportional hazards regres- 
sion or logistic regression. Therefore, issues 
such as effect modification and confounding 
can be handled in exactly the same way as 
in the “classical” application of these tech- 
niques. Of course, due to the longitudinal 
nature of the data, possible effect modifiers or 
confounders can be time-independent as well 
as time-dependent. 

The example shown in Figure 18.4 is an 
example of a study with single type event data. 
Only one kind of event (i.e., “treatment suc- 
cess”) is used as outcome. Although the inter- 
pretation of the results is slightly different, it 
is obvious that the same kind of approaches 
as described in this paper can be used for 
analysis of multitype event data such as tumors 
at different sites, different kinds of infection, 
etc. (Wei and Glidden, 1997). 
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4 General recommendation 


The way the results of an experimental 
study should be analyzed highly depends on 
the research question of interest. If one is 
only interested in a particular short-term or 
long-term result, simple techniques are highly 
appropriate. However, if the development of 
a particular outcome is of interest, the longi- 
tudinal statistical techniques are necessary to 
answer the accompanying research question. 
For continuous outcome variables, the results 
of GEE analysis and random coefficient anal- 
ysis are comparable, but for dichotomous out- 
come variables, this is not the case. When 
a dichotomous outcome variable is analyzed 
and when the longitudinal dataset consists of 
discrete time-points, GEE analysis or random 
coefficient analysis can be used, but GEE is 
to be recommended because of the availabil- 
ity of the population average approach and 
the relatively “simple” and robust estimation 
procedures compared to random coefficient 
analysis. When the events can occur continu- 
ously, Cox proportional hazards regression for 
recurrent events must be used, but special atten- 
tion has to be given to the definition of the “time 
at risk” and to the assumption of proportional 
hazards. 
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| Chapter 19 i 


Analyzing longitudinal qualitative 
observational data 
Johnny Saldafia 


1 Analyzing longitudinal qualitative 
observational data 


This chapter assumes that readers already have 
experience writing detailed and useful field- 
notes through a longitudinal time period (see 
Kathleen M. DeWalt and Billie R. DeWalt’s Par- 
ticipant Observation: A Guide for Fieldworkers 
[2002]; Robert M. Emerson, Rachel I. Fretz, and 
Linda L. Shaw’s Writing Ethnographic Field- 
notes [1995]; and James P. Spradley’s Partici- 
pant Observation [1980] for examples). What 
I address below is my signature approach to 
analyzing fieldnote data derived from long-term 
observations of participants in social settings. 
Due to restrictions of chapter length, I can- 
not detail the theoretical underpinnings and 
nuances of my eclectic methods of longitudinal 
qualitative data analysis. Instead, I will focus on 
pragmatic matters and refer readers to my book, 
Longitudinal Qualitative Research: Analyzing 
Change through Time (2003) for a more thor- 
ough discussion of the complexities involved 
with this topic. 


1.1 Change processes 


I propose that researchers are more likely to 
discern participant or environmental change (if 
any) from longitudinal qualitative data if we 


code them using indicators of change processes. 
These processes were adapted from a litera- 
ture review of quantitative and qualitative lon- 
gitudinal methods (e.g., Fullan, 2001; Huber 
and Van de Ven, 1995; Kelly and McGrath, 
1988; Royce and Kemper, 2002; Menard, 
1991; Miles and Huberman, 1994; Strauss 
and Corbin, 1998; Sztompka, 1993), and con- 
structed from personal longitudinal research 
experiences (Saldafia, 1995, 1996, 1997, 1998a, 
1998b, 2005). Just as there is statistical increase, 
decrease, constancy, idiosyncrasy and the like 
in quantitative data, so too can there be qualita- 
tive increase, decrease, constancy, idiosyncrasy 
and the like within and among participants in 
social settings. 

Coding qualitative data is a preliminary form 
yet vital function of analysis, for the process 
involves the researcher’s comparison of data 
from apportioned time periods and attributing 
difference and thus possible change (if any) 
to written field observations. In the methods 
I propose in this chapter, coding and conse- 
quent analysis adhere to a particular reper- 
tory of change processes, which can be seen 
in abbreviated form in Figure 19.1—a longi- 
tudinal qualitative data summary matrix. This 
matrix enables researchers to pool, summarize, 
and transfer qualitative fieldnote data from a 
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DATA TIME POOL/POND: FROM // THROUGH // 
STUDY: RESEARCHER(S): 


(when possible or if relevant, note specific days, dates, times, periods, etc. below; use appropriate DYNAMIC descriptors) 
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PRELIMINARY ASSERTIONS AS DATA ANALYSIS PROGRESSES 
(refer to previous matrices) 


THROUGH-LINE 
(in progress) 


Figure 19.1 Longitudinal qualitative data summary matrix 
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selected time period of a longitudinal study 
onto a readable one-page format. 

Imagine that one matrix page holds summary 
observations from three months’ worth of field- 
work. And if the study progresses through two 
years, then there would be eight pages total 
of longitudinal qualitative data. I’ve developed 
several metaphors to illustrate how multiple 
matrix sheets arranged in chronological order 
work together as a strategy for analyzing partici- 
pant change. Think of each three-month page 
as an animator’s cartoon cell, whose artwork 
changes subtly or overtly with each successive 
drawing to suggest movement and change. Or, 
imagine that each matrix sheet is a monthly 
page from a calendar, which suggests a chrono- 
logical progression of time and change as each 
page is turned. Or, imagine that each matrix 
page is a photograph of the same child taken at 
different intervals across time, so that each suc- 
cessive photo reveals growth and development. 
Finally, consider that each matrix might also 
represent a page from a personal diary, whose 
chronological flow of entries tells a narrative of 
what happened and then what happened next. 

The matrix may, at first glance, appear con- 
ceptually or theoretically abhorrent to some, 
and admittedly there is a “breaking in” period 
to using it effectively. Nevertheless, it’s a model 
I’ve tested and modified from its original design 
(Saldafia, 2003, pp. 54-55, 172; Saldafia, 2005), 
and now propose it as a system for longitudi- 
nal qualitative data entry. Note that I did not 
write that this is a system for longitudinal qual- 
itative data analysis, for the analytic thinking 
must still be done by you. The primary func- 
tions of the matrix are data management and to 
reduce and categorize field observations from 
a selected time period to assist the researcher 
with “analysis at a glance,” if you will. 


1.2 The matrix 


The chapter will now focus on the inventory 
of change processes from Figure 19.1 in more 
detail. Each cell also provides an opportunity 


to briefly discuss major concepts related to lon- 
gitudinal qualitative data analysis. 


DATA TIME POOL/POND 

The top entry records the start and end dates of 
a particular period of time from the longitudinal 
research study. (A “pond” is a smaller or shorter 
portion from a “pool” of data.) Each pool/pond 
and thus page can consist of a month’s, year’s, 
or even several years’ worth of summary data. 
Though this may be stating the obvious, it is 
critical that you routinely note the date and 
time of all observations in your fieldnotes with 
attention to such time-related contexts as the 
day of the week, the number of days into a 
school year, the season, holiday or ceremonial 
preparations in progress, ages of participants 
(in years and months), and culturally specific 
conceptions of time (see Levine, 1997). 

Each matrix page does not have to include 
data from a standardized timeframe or regularly 
fixed time interval, such as the end of one aca- 
demic year or grade level for studies in educa- 
tion. Depending on the length of your project 
and as you’re observing and collecting data, you 
may notice clusters of similarities within partic- 
ular varying timeframes such as three months, 
seven months, two years, then ten months. This 
researcher-generated division of total fieldwork 
time is a task that “enables you to exam- 
ine the dynamics of change such as duration, 
frequency, and tempo, which then supports 
the development of conceptual phases, stages, 
cycles or other rhythms of human action” 
(Saldafia, 2003, p. 160). 

The first matrix in the series—the “genesis” 
page—is the most difficult to complete, for 
it contains baseline data for comparison with 
other future matrix pages. The researcher relies 
on fieldnotes that have accumulated through 
the first phase, stage, or cycle of fieldwork that 
suggest change. For example, one of my longitu- 
dinal case studies focused on the artistic devel- 
opment of Barry (pseudonym), a male from ages 
5 through 26 (see Saldafia, 1995, 1998a, 1998b, 
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2005). Though I had been analyzing the data 
periodically at various time periods during this 
young man’s life course, the final analytic ven- 
ture happened when he turned age 26, the time 
when both Barry and I agreed that the study had 
come to a conclusion. The first pool of data for 
the genesis matrix page consisted of observa- 
tions from ages 5.5 through 12.5 [years.months]; 
the second pool from ages 12.6 through 16.4; 
the third pool from ages 16.5 through 18.5; the 
fourth pool from ages 18.5 through 23.3; the 
fifth pool from ages 23.4 through 23.11; and 
the sixth and final pool from ages 24 through 
26.1. The rationale for the divisions were as 
follows: 


™ Ages 5.5-12.5: Barry’s elementary school 
years, grades K-6, consisted of data that 
tracked his participation in and responses to 
theatre. Though each grade level was a pond 
of data for comparison with other grade lev- 
els, the total grades K—6 pool was selected 
for one matrix page since this seemed a 
developmentally-appropriate division as an 
educational study. 

Ages 12.6-16.4: There would be no direct 
observation of the participant for the next 
four years, yet data from these stages of 
Barry’s life course were provided retrospec- 
tively in later interviews. This second pool 
included two key epiphanies and thus mer- 
ited a separate matrix page. 

m= Ages 16.5-18.5: Like the first data pool, the 
total grades 10-12 pool—Barry’s high school 
years—was selected for one matrix page since 
this seemed a developmentally-appropriate 
division as an educational study. 

= Ages 18.5-23.3: This fourth pool of data 
consists of what I labeled, in retrospect, a 
“searching” phase within Barry’s life course. 
The time period covers a few years of post- 
secondary school employment and commu- 
nity college education, but the end of the 
time period was determined by a key inter- 
view with the participant—a data gathering 


experience that suggested he was on the verge 
of discovery within his life course. 

m Ages 23.4-23.11: The fifth pool of data 
was determined solely by the interview 
schedule—the time period between Barry’s 
last interview at age 23.3, and the next one 
at age 23.11. There was no direct participant 
observation during this time. 

m Ages 24—26.1: The final pool of data consists 
of a time period in which Barry’s life and 
career goals crystallized. It includes both uni- 
versity education and an epiphanic moment 
of discovery, but its conclusion at 26.1 years 
of age was determined solely by an agree- 
ment between the participant and myself that 
the formal study had come to a satisfactory 
conclusion. 


Hence, the division of data time pools/ponds 
for matrix pages can consist of such traditional 
and socially constructed periods as: schooling, 
periods between data gathering opportunities, 
epiphanies/turning points in a participant’s life 
course, and retrospective division of a partici- 
pant’s life course into phases, stages, or cycles 
of human development. 

Most longitudinal studies with children 
and adolescents report any observed changes 
according to their ages or grade levels— 
traditional developmental markers that provide 
some sense of standardization across several 
disciplines such as psychology, education, and 
sociology. Recent methodology (e.g., Levine, 
1997; Tudge and Hogan, 2005, p. 114), how- 
ever, notes that varying cultural backgrounds of 
participants, ranging from ethnicity to national- 
ity, may influence and affect change across time 
and thus present contradictory findings when 
we compare individuals or groups. Likewise, 
CONTEXTUAL AND INTERVENING CONDI- 
TIONS (see discussion below) such as nontra- 
ditional home environments or personal family 
tragedy, may present different change patterns 
that do not conform to theoretically universal 
patterns of child development. 
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I recommend that you accumulate a mini- 
mum of three matrix pages of data for analy- 
sis to support any observations and assertions 
of change. Two matrix pages provide only a 
“then and now” comparison, which potentially 
glosses over the processes of participant change. 
Three or more pages enable you to chronologi- 
cally path the journey of participant change (if 
any) through time. Change does not always hap- 
pen during regularly fixed time intervals, and 
change can be such a slippery and elusive pro- 
cess to track that it can only be noticed in ret- 
rospective data analysis rather than during the 
moment or period it occurs. It is the researcher’s 
intuitive but strategic choice to decide which 
data from the corpus constitute a pool or pond. 


STUDY and RESEARCHER(S) 

Inserting the names of the longitudinal study 
and its researcher(s) is fairly obvious, but let 
me offer a few notes on research team mem- 
bers jointly analyzing longitudinal qualitative 
observational data. 

Colleagues, whether fellow researchers or the 
participants themselves, are excellent resources 
for deep and insightful conversations about 
the data. This matrix and its particular change 
processes lend themselves to productive dia- 
log about the actions and phenomena observed 
through time. Reaching consensus on whether 
an observation of change is an INCREASE or 
DECREASE (see discussion below) helps focus 
the interpretations. But be aware that multiple 
researchers responsible for analyzing longitudi- 
nal qualitative data can turn into a longitudinal 
enterprise in and of itself if the dialog becomes 
stalled, particularly during coding processes. 
All research team members should monitor 
themselves and each other on the progress of 
their analytic endeavors and ensure that every- 
one is working toward the ultimate goal: What, 
if anything, has changed through time? 


DYNAMIC Descriptors 
Dynamics refer to the dimensions and variabil- 
ity of qualitative data. This note at the top of the 


matrix serves as a reminder to choose descrip- 
tive words carefully as entries are made in each 
cell. 

The qualitative researcher’s observations of 
change are highly interpretive acts. Few of us 
would disagree that the destruction of the US 
World Trade Center buildings on September 11, 
2001, or the devastation inflicted by hurricane 
Katrina on the city and people of New Orleans 
in late August 2005, were events of tremen- 
dous magnitude and thus classified as EPIPHA- 
NIES or TURNING POINTS using these typical 
categories of change. But my word choices of 
“destruction,” “devastation” and “tremendous 
magnitude” are, in fact, value judgements of my 
observations, for I could have also used such 
terms as “leveling,” “annihilation,” and “catas- 
trophic proportions”—synonymous descriptors 
yet evocative of different meanings. 

Just as the dynamics of observation and 
change are highly interpretive acts, so too 
are the interpretations of whether particu- 
lar observations are increases, decreases, or 
consistencies in certain actions. For example, 
one researcher may note that an adolescent’s 
clothing—from goth one year to punk the next— 
has become “increasingly radical,” while a sec- 
ond may note that the wardrobe has become 
“decreasingly mainstream,” while yet a third 
observes that the clothing styles have remained 
“consistently nontraditional.” 

Quantitative research has the advantage of 
scaling change on continua such as “none at 
all” to “somewhat” to “very much.” Qualita- 
tive research can also apply such descriptors 
to change processes, but since the paradigm’s 
advantage is language rather than statistics, 
words become powerful and, arguably, more 
accurate indicators of observed change. When 
we say that a person has become “more 
conservative” through time, this implies we 
were focusing narrowly on his conservatism 
to begin with and recorded change as a sim- 
ple increase, when we could have plotted a 
personal change of ethos using a qualitative 
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continuum: from “hardcore ACLU-er” to a 
“self-proclaimed ‘independent’” to a “battle- 
weary liberal” to “socially conscious yet politi- 
cally moderate.” Through dynamic descriptors, 
we extend beyond a “then and now” model 
and illustrate not just incremental participant 
change but participant transformation. And if 
we are concerned in our analysis with such time 
orientations as tempo, frequency, or duration 
of actions, there is a difference, for example, 
between describing a participant in an admin- 
istrative position who provides “two additional 
yet difficult years of service,” and one who 
provides “twenty-four grueling months of self- 
sacrificing service above and beyond the call 
of duty that deteriorated his physical and emo- 
tional health.” 

I have found that the search for just the 
right word can sometimes confound my analy- 
sis and writing, but it forces me to think deeply 
about how I am representing and presenting my 
observations of change. This reflection is both 
an introspective audit—a “reality check”—of 
qualitative data, and an exercise in confronting 
one’s credibility and trustworthiness with lan- 
guage choices. 

The discussion now turns to the data entry 
cells themselves—bullet points of observations 
that summarize and capture the essences and 
essentials of participant change (if any) through 
time. 


INCREASE/EMERGE 

This cell includes summary observations that 
answer the question, What increases or emerges 
through time? This may include both quanti- 
tative and qualitative observations, yet primar- 
ily the latter. Increases in an individual’s age 
and weight are examples of quantitative change, 
but related qualitative increases and emer- 
gences may include such participant observa- 
tions as “difficulty moving,” “worn and tired 
facial expressions,” and “darker wardrobe col- 
ors” for the particular data pool. Richer infer- 
ential meanings from these patterns can be 


constructed by asserting more than just “getting 
older.” 

As an extended example, in the longitudi- 
nal qualitative study with Barry, the first data 
pool at ages 5.5-12.5 included the following as 
increases and emergences: 


= additional _theatre-viewing 
beyond the treatment 

® parental involvement in nurturing his theatre 
interest 

™ at age 12, reflecting on career choices (actor, 
writer, “think tank”) 

™ at ages 11.8-12.5, victim of bullying by peers 

™ at age 12, counseling for withdrawal and 
depression. 


experiences 


Barry was not formally tracked from ages 
12.6 through 16.4, since the initial longitu- 
dinal study was completed. In retrospective 
accounts during later years, he recalled two 
key epiphanies (a suicide attempt and his 
first formal performance experience) during the 
period I did not directly observe him. Related 
INCREASE/EMERGE data from the second pool 
included: 


™ at ages 12.6-14.6, anxiety from peer bullying 

= hair length 

= new: smoking, illegal drug use 

™ age 14.7, attitude “renaissance” from first and 
future performance opportunities. 


Follow-up and direct participant observation 
was initiated during Barry’s final secondary 
school years. The third data pool at ages 
16.5-18.5 listed the following increases and 
emergences: 


roles in theatre productions 

concentration during performance work 
“passion” for the art form 

new: mentorship from theatre teachers 
leadership skills 

new: questioning his spiritual faith/belief 
systems. 
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The fourth data pool at ages 18.5—23.3 included 
the following as increases and emergences: 


™ searching for “artful living” 

™ service as a summer camp counselor for spe- 
cial populations 

® attending a different church but same faith 

™ attending community college for general 

studies 

learning ASL [American Sign Language] 

exploring drama therapy as a career 

new: personal credit card 

new: prescription medication for bipolarity. 


The fifth data pool at ages 23.4—23.11 listed the 
following as his increases and emergences in 
actions: 


= deciding between social work and urban soci- 
ology as possible majors at the university 

= providing urban ministry for youth 

™ new: eyebrow piercing, facial hair, spiked 
hair style. 


The sixth and final pool of data at ages 24—26.1 
listed as his increases and emergences: 


™ university education: pursuing a bachelor’s 
degree in social work with a minor in reli- 
gious studies 

= new: tattoo on left arm—“fight, race faith” 
(from 2 Timothy 4:7) 

™ preaching occasionally at Sunday worship 
services 

= new: disclosure of his father’s past spiritual 
abuse 

= working for “social justice” 

= new: rock climbing as a hobby. 


An INCREASE as change differs from the 
next two cells—observations that are CUMU- 
LATIVE, and observations that are SURGES/ 
EPIPHANIES/TURNING POINTS. An increase 
or emergence in qualitative change is a phe- 
nomenon or participant action that appears or 


transforms in subtle, smooth, or expected ways. 
Cumulative change is a transformation in par- 
ticipant quality as a result of successive expe- 
riences through time. Surges, epiphanies, and 
turning points suggest change that is unchar- 
acteristically rapid or sudden, a revelatory or 
insightful experience for the participant, or an 
event of such significant magnitude that it redi- 
rects the course and flow of a participant’s con- 
sequent actions. 


CUMULATIVE 

This cell includes summary observations that 
answer the question, What is cumulative 
through time? Examples may include: a fifth- 
year teacher’s finely developed expertise with 
classroom management based on her four 
years of previous instructional experience; 
a child’s oral language fluency by age 12; 
and Barry’s accumulated coursework experi- 
ences from community college and university 
education. 

As noted above, cumulative affects are 
changes as a result of successive experiences 
through time. A minimum of three matrix pages 
permits you to track how selected common phe- 
nomena or actions transform in quality. This 
three-cell “time triangulation” (Saldafia, 2003, 
p. 164) provides a data-trail of support for any 
assertions related to this type of change process. 
Cumulative change, however, is not always a 
smooth path, particularly with human develop- 
ment’s and social life’s unexpected twists and 
turns along the journey. Hence, the next set of 
change processes needs to be taken into consid- 
eration during analysis. 


SURGE/EPIPHANY/TURNING POINT 

This cell includes summary observations that 
answer the question, What kinds of surges, 
epiphanies, or turning points occur through 
time? This category of change includes events, 
experiences, or personal revelations of magni- 
tude in a participant’s life course which signifi- 
cantly alter attitude, value, and belief systems; 
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initiate future actions in different directions; or 
lead to exponential and multiple consequent 
changes. A surge may be interpreted as an accel- 
erated increase, decrease, etc.; an epiphany as 
an impacting event or revelatory moment; and 
a turning point as a juncture in the life course 
that separates significantly different courses of 
action. Examples may include: the devasta- 
tion in the city of New Orleans after hurricane 
Katrina; a woman’s new life course directions 
after a divorce; or Barry’s suicide attempts dur- 
ing his junior high and post-secondary school 
years. 


DECREASE/CEASE 

This cell includes summary observations that 
answer the question, What decreases or ceases 
through time? Examples may include: a decline 
in faculty collegiality after disagreement over 
new directions for a degree program; a decrease 
in self-motivation at the workplace; or Barry’s 
decrease and eventual cessation of illegal drug 
use. 

A cautionary note: since written and recorded 
data themselves serve as evidence for our asser- 
tions, keep in mind that fieldnotes through 
time, particularly in the latter phases, stages, or 
cycles of fieldwork, do not always record what 
was gathered at the onset. Hence, what may 
seem “less of” something as time continues may 
actually be due to the fact that the fieldnotes 
themselves simply have not maintained a con- 
tinuous record of initial phenomena or actions. 


CONSTANT/CONSISTENT 

This cell includes summary observations that 
answer the question, What remains constant 
or consistent through time? Note that it is the 
largest of the descriptive data cells in the top 
row, for my own research experiences suggest 
that a sizable portion of fieldnote observations 
may remain constant and consistent through 
time (assuming no actions of consequential 
magnitude, such as epiphanies). The “recur- 
ring and often regularized features of everyday 


life” (Lofland et al., 2006, p. 123) are, after 
all, what provide “analytic significance” for the 
social scientist. Examples may include: routine 
workplace methods prescribed by a standard 
operations manual and followed by employees; 
a continuous, perceived “lack of time” by a 
working mother; and Barry’s residence with his 
parents from birth through age 26.1. 

A culture is not a fixed system; it is an 
ever-changing and ever-evolving social organi- 
zation. Yet constancy and consistency suggest 
the existence and thus documentation of pre- 
dictable patterns in a social setting, and change 
may or may not occur within the established 
routines and rituals of daily life. Interpreta- 
tion of this stability can range from the secure 
and steady to the rigid and stagnant, revealing 
“either something significant at work or noth- 
ing out of the ordinary” (Saldafia, 2003, p. 165). 
Nevertheless, we cannot discern what is chang- 
ing unless we also know what is not changing. 
And when something does indeed change, we 
need to document how selected areas of partici- 
pant constancy and consistency may have been 
influenced and affected. 


IDIOSYNCRATIC 
This cell includes summary observations that 
answer the question, What is idiosyncratic 
through time? Anomalies occur in the daily 
routine lives of participants, and these devi- 
ations merit their own cell for data entry. 
These do not include phenomena or actions 
of magnitude such as epiphanies, but rather 
the “inconsistent, ever-shifting, multidirec- 
tional and, during fieldwork, unpredictable” 
(Saldafia, 2003, p. 166). Examples may include: 
a child’s sudden yet fleeting interest in a fic- 
tional superhero; a temporary relocation of per- 
sonnel and operations while an office facility 
undergoes repairs; and Barry’s irregular mood 
swings while under prescription medication for 
bipolarity. 

As noted above, our life journeys do not fol- 
low a smooth path. Irregularities and uneven 
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slopes of development are “givens” in the 
human condition, and thus the idiosyncratic 
deservedly earns its place in the matrix. The 
idiosyncratic is not a deviation from a smooth 
observable pattern of change—the idiosyncratic 
is part of the pattern if its occasional or even 
frequent occurrence can be explained. 


MISSING 

This cell includes summary observations that 
answer the question, What is missing through 
time? During participant observation, the 
researcher certainly notes what is present in the 
social setting, but should also consider what is 
absent or missing. Examples may include: a lack 
of material resources for more effective class- 
room instruction; a vacant staff position caus- 
ing other employees to carry the workload; or 
Barry’s lack of theatre experiences during his 
junior high school years. “Just because some- 
thing is missing doesn’t mean that nothing is 
being influenced and affected” (Saldafia, 2003, 
p. 166). Certainly we can identify a multitude of 
absent phenomena and actions in any social set- 
ting, but in this cell we note that which is most 
possibly and plausibly missing as they relate to 
what is present. 

Once a time-apportioned matrix page of sum- 
mary longitudinal qualitative data have been 
entered in the first row, the next step is to com- 
pare (with the exception of the first or gene- 
sis page) how these current observations differ 
from those logged in other matrix pages about 
the participants’ or environment’s chronologi- 
cal past (and future). 


DIFFERENCES ABOVE FROM PREVIOUS 
DATA SUMMARIES 

For each cell in the top portion of the matrix, 
similar observations from previous matrices are 
used to compare the current conditions. This 
row spirals the researcher back to previous 
observations of similar change processes and 
integrates them. It is not necessarily data that 
are entered in these cells but rather brief ana- 
lytic jottings. 


For example, in Barry’s fourth data pool 
at ages 18.5-23.3, one of the entries in the 
INCREASE/EMERGE cell was: “searching for 
‘artful living’.” Previous data pools in the 
INCREASE/EMERGE cells included references 
to his “additional theatre-viewing experiences” 
from “parental involvement in nurturing his 
theatre interest,” and then “performance oppor- 
tunities” under “mentorship from theatre teach- 
ers,” which led to an “attitude ‘renaissance’” 
and a “passion” for the art form. But at ages 
18.5—23.3, theatrical performance was no longer 
part of Barry’s life course and thus no reference 
to it is included in the INCREASE/EMERGE cell 
for this particular matrix page. (It is, however, 
noted in the DECREASE/CEASE and MISSING 
cells.) 

Of course, it would be foolish to isolate and 
strictly compare INCREASE/EMERGE data only 
with other INCREASE/EMERGE cells. What 
increases or emerges in a participant’s life 
course must also be compared and consid- 
ered with other change processes—what has 
decreased, what is missing, what epiphanies 
occurred, and so on. If you haven’t already, 
notice that there are dashed and not solid lines 
separating each cell in the matrix. This was a 
deliberate artistic and methodological choice to 
suggest that “the ocean—not the landscape—of 
longitudinal qualitative data knows no bound- 
aries and has open access to flow where 
needed” (Saldafia, 2005, p. 7). Thus, one entry 
in this second row’s INCREASE/EMERGE cell 
includes the analytic jotting: 


m= Barry’s new search for “artful living”: an 
attempt to fill an emotional void left by non- 
participation in theatre. 


This inferential linking is just one example 
of how the researcher compares and syn- 
thesizes qualitative data across cells in the 
same row. 

Now that the descriptive cells have been dis- 
cussed, we examine how these observations of 
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difference and thus change are nested within a 
broader social scheme. 


CONTEXTUAL/ANTERVENING CONDITIONS 
INFLUENCING/AFFECTING CHANGES 
ABOVE 

This portion of the matrix is reflective space 
for jotting how, how much, in what ways, and 
why the descriptive observations in the top por- 
tion of the matrix may have occurred (Saldajfia, 
2003, p. 161). This is where analysis begins to 
transform and transcend the descriptive field 
observations into deeper insight as to indi- 
vidual participant agency, sociocultural factors 
and environments, historic events leading to the 
time period, and other major contributors that 
influence and affect change. 

The difference between a contextual and an 
intervening condition is a matter of researcher 
interpretation, but let me explain briefly how I 
distinguish between the two. Contextual condi- 
tions are the “givens” of everyday social life— 
our established environments, structures, pro- 
cesses, and actions that contain and compose 
the repertory of human routines. Contextual 
conditions also include our personal “givens”— 
gender, ethnicity, habits, etc. A child learning 
addition, a church’s congregation worshipping 
on Sunday morning, an Hispanic social ser- 
vice agency servicing its community, and Barry 
attending high school, are examples of contex- 
tual conditions. Yet even within these everyday 
contexts, change can happen: the child learns to 
add one digit figures then progresses onto sub- 
traction; the congregation welcomes new mem- 
bers into their church body and increases in 
size; the social service agency increases its vol- 
untary staff by two when Hispanic interns join 
the team for a year; and Barry takes typical 
school courses and works toward earning a high 
school diploma. 

But when a condition is “perceived as a pur- 
poseful, unanticipated, or significant action, 
structure, or process that influences and affects 
participant change through time” (Saldajfia, 


2003, p. 162), the contextual has now become 
the intervening. “Since contexts are contex- 
tual, some conditions initiate greater change 
than others” (ibid.). When a child learns to 
add quicker than peers through an experimen- 
tal teaching approach to mathematics using 
computers; when the congregation increases in 
size because a neighboring church’s pastor has 
become involved in a sex scandal; when the 
Hispanic social service agency loses one of its 
primary sources of financial support due to 
their perception of racism in local government 
and must now rely more on voluntary staff 
assistance; and when Barry enrolls in theatre 
courses and participates in extracurricular play 
productions under the direction of a nurturing 
mentor, which eventually motivate him to cease 
illegal drug use; these are examples of interven- 
ing conditions and their consequent influences 
and affects on participants. 

“Influences and affects” is my qualitative 
replacement for the positivist construction of 
“cause and effect.” First, “influences” sug- 
gests that there are multiple, networked inter- 
actions and complex interplay between and 
among conditions that drive change. Sec- 
ond, the consequences of those changes create 
“affects”’—multiple outcomes that cognitively, 
emotionally, and sometimes physically affect, 
in rippling fashion, participants and their social 
environments. 

This horizontal row is composed of cells that 
ask you to inductively and/or deductively infer 
from the descriptive data above the conditions 
that led to the observations to date. Like the row 
above this one (DIFFERENCES ABOVE FROM 
PREVIOUS DATA SUMMARIES), the task is for 
the researcher to compare and synthesize qual- 
itative data across cells. 


INTERRELATIONSHIPS 

This cell includes researcher reflections 
that answer the question, Which changes 
interrelate through time? Interrelationships 
(“correlations” in quantitative parlance) are 
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researcher-constructed perceptions of how 
individual phenomena, actions, social struc- 
tures, change processes and so on, share var- 
ious types of jointure. Examples may include: 
the complex interaction and interplay of social 
institutions (family, community, school, peers, 
media, etc.) influencing and affecting a young 
person’s attitude, value, and belief systems; a 
teacher’s strategic use of voice (volume, rate, 
pitch, etc.) in varying qualities to influence 
and affect student dynamics and thus cre- 
ate consciously-desired learning environments; 
and Barry’s immersion in illegal drug use, the 
church, and theatre in varying degrees and over- 
lapping phases from ages 12 through 26 as 
quests for spiritual fulfillment. 

Each cell thus far contains essentialized 
data. And in my originally designed matrix 
I included the parenthetical direction in the 
INTERRELATIONSHIPS cell: “circle and con- 
nect above, then analyze,” meaning: draw lines 
to link the cells or entries that correlate, com- 
parable to a connect-the-dots drawing. Later I 
learned that 


this proved to be both ridiculously bravado advice 
and a nearly impossible task. Interrelationships are 
human constructions of immense complexity... and 
my own mind saw the intricate webbing of inter- 
action and interplay between virtually all items in 
the matrix. As Barry’s mother herself told me in an 
interview when I asked her to identify where Barry’s 
interest in theatre came from, the influences were “a 
great big conglomeration” of factors and experiences 
that were “just very tightly interwoven. I don’t even 
know how we can separate them.”... As I scanned 
the matrix, I didn’t even know how I could connect 
them. (Saldafia, 2005, pp. 14-15) 


I’ve observed that, whether it be the way data 
are strategically logged in the matrix or some 
ironic master plan of life comparable to a grand 
unification theory, there is interrelationship 
between increases and decreases; between the 
constant and the missing; between and among 
the cumulative and contextual; and between 


and among any other possible and infinite cell 
combinations. But how do you know where to 
begin and when to stop? Not everything that 
interrelates from your perspective may be valid, 
and any assertions of causality stand on shaky 
ground in observational studies. Interview data 
from participants on the influences and affects 
on their lives will supplement and better inform 
researcher speculation about connecting chains 
of action and environmental factors at work: 


The conceptual process for analyzing both interre- 
lationships and change in longitudinal qualitative 
data is like observing various colored dyes mixing 
together in the natural currents of water. ... The even- 
tual discovery of interrelationships between ponds 
and pools of data is one that emerges after multi- 
ple readings of the corpus, data reduction into visual 
displays, and extended reflection on all possible con- 
nections and overlap. (Saldafia, 2003, p. 168) 


CHANGES THAT OPPOSE/HARMONIZE 
WITH HUMAN DEVELOPMENT/SOCIAL 
PROCESSES 
This cell includes researcher reflections that 
answer the question, Which changes through 
time oppose or harmonize with natural human 
development or constructed social processes? 
The observations you make of social life may 
or may not harmonize with or conform to exist- 
ing research in your discipline. Examples may 
include: a first-year, inner-city teacher’s class- 
room management difficulties, comparable to 
other teachers’ experiences in similar settings 
during the early stages of their professional 
development; a not-for-profit company’s suc- 
cess, whose unexpected patterns of growth and 
development contradict previous observations 
and research in established organizational stud- 
ies; and Barry’s epiphanic calling to serve God 
through the ministry, an experience similar to 
other spiritual leaders’ awareness of their life 
purpose. 

Your literature review and previous research 
experiences about the current study will inform 
you whether your longitudinal observations 
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align themselves with established social pro- 
cesses, or whether they oppose what may 
be considered normative. Unique cases may 
present qualitative trajectories that contradict 
or supplement long-standing findings from 
developmental or organizational studies. Both 
contextual and intervening conditions, coupled 
with hypothesized interrelationships (and, in 
some Cases, participant or conceptual rhythms, 
discussed below), weave together to create a 
complex tapestry of influences and affects. 

This portion of the matrix asks you to relate 
your own study’s observations with the classic 
and current research in your field to assess how 
participants stay or stray from the anticipated 
courses of development and action. Such reflec- 
tions can support, build on, refute, or revise the 
extant literature. Analyzing an individual case 
study or small group of participants in a single 
site, albeit long-term, may not generate enough 
data for a convincing argument of transferabil- 
ity to other settings. A larger number of par- 
ticipants and multiple observation sites more 
convincingly support any assertions of gener- 
alizability and theory construction for broader 
social contexts. 


PARTICIPANT/CONCEPT RHYTHMS 

The PARTICIPANT/CONCEPT RHYTHMS cell 
includes researcher reflections that answer the 
question, What are participant or conceptual 
rhythms (phases, stages, cycles, etc.) through 
time? Depending on the nature of your research 
question or study’s goals, you may observe that 
social life and its accompanying changes can 
be “apportioned into theoretical periodicities of 
human action” (Saldafia, 2003, p. 169). These 
periodicities can be constructed as phases, 
stages, cycles, or other forms and combinations 
of time-based clusters. Barry himself labeled 
some of the phases and stages of his own 
life course as “a dead period,” “romantic,” the 
search for “artful living,” and an intense period 
of “academic rigor.” 


The intuitive or systematic division of your 
data corpus into varying time periods for mul- 
tiple matrices may prompt you to investigate 
whether each matrix page is a constituent unit 
of coterminous yet varying participant rhythms, 
much like time signatures, tempos, and mea- 
sures are parts of written music. It may not 
always be possible to state with precision when 
or under what specific contexts the beginning 
and ending of a phase, stage, or cycle are initi- 
ated and concluded. Nevertheless, you should 
develop a label that encases and describes the 
rhythmic cluster, with theoretical explanations 
for how, why, or in what ways a participant 
enters the apportioned period and transitions 
from one through the next. 


PRELIMINARY ASSERTIONS AS DATA 
ANALYSIS PROGRESSES 

The largest matrix cell is critical space for the 
longitudinal researcher to “think out loud.” 
This cell includes reflections on how every- 
thing above it blends together and (literally) 
trickles down. However, memo-generation and 
assertion development (Erickson, 1986) are not 
reserved exclusively for the final stage of the 
analytic cycle. Whenever a connection, insight, 
or even a question occurs to you, no matter at 
what phase or stage of data entry or analysis, 
write it down in this cell. Examples from the 
study with Barry include: 


= Barry has always had a rigorous physical out- 
let in one form or another: football, acting, 
weightlifting, rock climbing. 

= Throughout adolescence, Barry has exhibited 
high inter- and intrapersonal intelligences, 
traits Howard Gardner says are conducive to 
performers and the clergy. 

® Ironically, drugs are still playing a role in 
Barry’s life—before, they were illegal and 
recreational for his depression; now, for 
his bipolarity, the drugs are prescription—to 
help him with his depression. 
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You may find yourself scanning the matrix 
cells vertically, horizontally, diagonally, sys- 
tematically, and even randomly to synthesize 
the data and find answers as if this were some 
type of elusive “seek and find” word game. 
My guidance is, “Whatever works.” I have 
concluded that the “longitudinal qualitative 
researcher’s analytic process is neither com- 
pletely linear nor holistic. It is iteratively—if 
not erratically—cumulative and serendipitous 
in knowledge building” (Saldafia, 2005, p. 15). 

I prefer to work with hard copy and pen- 
cil when I fill in these matrix pages, rather 
than enter the data into a word-processed 
file. This means I have to erase and rewrite, 
rather than delete and rewrite, my assertions in 
progress. And assertions are almost always in 
progress, for as data are gathered and entered 
into future matrix pages, these observations 
may put previously developed assertions in 
new perspectives and contexts. Barry’s psycho- 
logical counseling at age 12 and his two suicide 
attempts during adolescence did not overtly 
foreshadow his formal diagnosis of bipolarity 
at 18 years of age. But once this diagnosis 
was learned, his tragic past and actions “made 
sense,” so to say, and the earlier-developed 
assertions about his mental and emotional con- 
ditions now required revision due to newly 
acquired disconfirming evidence. 


THROUGH-LINE 

The THROUGH-LINE cell includes researcher 
reflections that answer the question, What is the 
through-line of the study? The through-line is 
a word, sentence, paragraph, and/or extended 
narrative that captures the essence and essen- 
tials of a participant’s journey and change (if 
any) through time. The through-line “includes 
references to time, processual terms, and mark- 
ers (beginnings, middles, and/or endings of 
the journey at various locations through time)” 
(Saldafia, 2003, p. 170). Examples include: 


= From his sophomore through senior years 
in high school, Barry gradually interchanged 


the insufficient spiritual fulfillment he 
received at church with the more _per- 
sonal and purposeful spiritual fulfillment he 
experienced through theatre (Saldafia, 2003, 
p. 154). 

= He ascends. From ages five through twenty- 
six, Barry has sought ascension in both lit- 
eral and symbolic ways to compensate for 
and transcend the depths he has experienced 
throughout his life course (Saldafia, 2005, 
p- 18). 


A through-line tends to evolve toward the 
end of the analytic cycle, but it is not necessar- 
ily the final required outcome from qualitative 
data analysis. The through-line is a summary 
statement—a section heading, topic sentence, 
or theme, if you will—that centralizes the nar- 
rative of a longitudinal qualitative study. This 
can emerge from researcher reflection about 
the data, or from the participant’s own per- 
ceptions (and in his or her own language as 
an “in vivo” code [Strauss and Corbin, 1998]) 
about the period under investigation. 

Recent scholarship (Clarke, 2005; Kincheloe, 
2005) advocates that the “messiness, com- 
plexity, and interconnectedness of social life 
cannot be captured through such reduction- 
ist methods and are thus futile endeavors. 
But... the through-line doesn’t negate the com- 
plexity of a life course, ... [it] distills the ocean 
of longitudinal qualitative data” (Saldafia, 2005, 
pp. 18-19). The through-line helps navigate the 
researcher’s journey as he or she writes the final 
epic of his participants’ changes (if any) through 
time. 


2 Final comments 


James A. Holstein and Jaber F. Gubrium (2000) 
in Constructing the Life Course, remind us that 


The life course and its constituent parts or stages are 
not the objective features of experience that they are 
conventionally taken to be. Instead, the construction- 
ist approach helps us view the life course as a social 
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form that is constructed and used to make sense of 
experience.... The life course doesn’t simply unfold 
before and around us; rather, we actively organize the 
flow, pattern, and direction of experience in devel- 
opmental terms as we navigate the social terrain of 
our everyday lives (p. 182). 


The Longitudinal Qualitative Data Summary 
matrix is offered as an organizational method to 
chart the flows, patterns, and directions of par- 
ticipant experiences and differences and thus 
change through time. It is a signature technique 
that I hope has utility and transferability to 
your own analytic practice. By no means is this 
matrix proposed as the perfect or only model 
available. I encourage you to adapt the con- 
tents and format to suit your own particular 
research project and goals. If qualitative data 
are fluid, then our instrumentation and displays 
must exhibit comparable fluidity. 

As a theatre artist, I approach most of my 
research projects dramaturgically—meaning, 
my observations and analyses of social settings 
are filtered through a “life as drama” frame- 
work. When I analyze participant data from 
interview transcripts, I can’t help but apply 
my training in character analysis gleaned from 
acting classes. When I analyze physical envi- 
ronments and participant dress and artifacts, 
my education in the principles of theatrical 
design emerges. And when I observe social life 
performed in my presence, I cannot help but 
apply my directing and playwriting experiences 
into the collection and analysis of ethnographic 
data. 

Each site visit is a scene or act of a (very 
long) play. Participants are like characters, each 
one with objectives, obstacles, tactics, flaws, 
and emotions. Each one develops and inter- 
acts with other characters who have anywhere 
from inconsequential to significant impact on 
their lives. But their futures are not scripted 
and predetermined by a playwright; their lives 
are improvisationally lived. Yes, there is rou- 
tine and constancy and hopefully stability to 
their lives. But nested within and breaking 


through these repetitive patterns are sponta- 
neous actions that force them—and us—to deal 
with life as it is complexly and challengingly 
lived. Sometimes we can predict with statis- 
tical precision, based on selected variables of 
interest, what pathways or outcomes lie ahead. 
But most times the future is unknown. “What 
happens next?” is the central question in the 
audience’s minds as we watch an engaging play, 
film, or television story unfold before us. “What 
happens next?” is also the driving question 
for longitudinal observational fieldwork and its 
concurrent qualitative data analysis. 


Glossary 


Assertions Descriptive, analytic, or interpre- 
tive summary statements, derived from qualita- 
tive data, of phenomena, participant actions, or 
social environments. 


Change processes Category-based _ trans- 
formations through time  (e.g.,“increase,” 
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“decrease,” idiosyncratic”). 


Dynamics The dimensions and variability of 
qualitative data, expressed through carefully- 
selected language. 


Matrix An interrelated, multicelled chart for 
qualitative data entry and analysis. 


Qualitative observational data Written field- 
notes gathered primarily from participant 
observation in naturalistic social settings. 
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| Chapter 20 i 


Configural frequency analysis 
of longitudinal data 
Alexander von Eye and Eun Young Mun 


1 Introduction 


In this chapter, first, the perspectives are 
discussed that researchers take when apply- 
ing configural frequency analysis (CFA). Specif- 
ically, when applying CFA, the focus is on 
individuals who differ in profiles or patterns 
instead of relationships among variables. A 
tutorial of CFA is given. In the section on long- 
itudinal methods of CFA, four ways of data 
analysis are presented. The first involves the 
analysis of group-specific temporal patterns. 
Using this method, one can identify the pat- 
terns of development that groups differ in. This 
method is compared with odds ratio analysis. 
The second longitudinal CFA method allows 
one to analyze trends over time. Specifically, 
the shape of change curves are analyzed, e.g., 
the linear and the quadratic trends. It is dis- 
cussed that the simultaneous analysis of trends 
that differ in order (e.g., linear and quadratic) 
can lead to cells in cross-classifications that 
contain structural zeros. The third CFA method 
for the analysis of longitudinal data involves 
taking a priori probabilities into account. The 
fourth method allows one to compare specific 
transition patterns with each other. Empirical 
examples are given using data from a study on 
the development of aggression in adolescence. 


Models for longitudinal data contain param- 
eters that are unique in the sense that they 
are not part of models for cross-sectional data. 
These parameters concern, for instance, the lin- 
ear trend, the acceleration of the linear trend, 
the change in acceleration, transition patterns, 
patterns of constancy, and information concern- 
ing the association structure of variables and its 
change over time. Configural frequency analysis 
(CFA; Lienert and Krauth, 1975; von Eye, 2002; 
von Eye and Gutiérrez-Pefia, 2004) is a method 
for the examination of multivariate categorical 
data. CFA allows one to study any parameter 
of longitudinal data, in particular if it can be 
placed in the context of categorical data. 

This chapter presents a selection of methods 
of longitudinal CFA. Specifically, we describe 
and illustrate (1) CFA of differences; (2) CFA 
of trends; and (3) CFA of symmetry in change. 
Before covering these facets of CFA of longitu- 
dinal data, we present an introduction to the 
concepts and methods CFA. 


2 CFA—a tutorial 


Consider the cross-classification of d cate- 
gorical variables. The cells of this cross- 
classification contain the observed frequencies 
of each pattern of variable categories, called 
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configuration. Using log-linear models, cross- 
classifications can be analyzed, focusing on the 
joint frequency distribution, the dependency 
structure, and the association structure of the d 
variables (Goodman, 1984). CFA offers an alter- 
native perspective. It allows one to examine 
individual cells of a cross-classification. Let the 
observed cell frequency of a cell be denoted by 
m, and the corresponding estimated expected 
frequency by e,, where r goes over all cells 
of the cross-classification. Then, based on the 
comparison of m, with e,, CFA states either that 


= Cell r constitutes a CFA type if m, > e,, or 
= Cell r constitutes a CFA antitype if m, < e,. 


If m, = e,, Cell r constitutes neither a type nor 
an antitype. The null hypothesis that must be 
rejected for these decisions is 


Ay: E{m,] = e;, 


where E[...] indicates the expectancy, and m, 
and e, are defined as above. The expected fre- 
quency e, is estimated under a model called 
base model. This model is specified such that 
it allows one to interpret types and antitypes 
in a clear-cut way (more detail follows later). 
In many cases, the base model is a log-linear 
model. In particular in the context of longitudi- 
nal data analysis other base models have been 
discussed also. This chapter will discuss two 
examples of such models. 


2.1 Testing cell-wise hypotheses in CFA 
Let the d variables, X,,.. 


d 
form a contingency table with R= [I ¢; cells, 


., X, be crossed to 


where c, is the number of categories of the ith 
variable. Let the probability of Cell r be m,, 
with r= 1,...,R. The frequency with which 
cell r was observed, is m,. The probabilities 
of the R cell frequencies depend on the sam- 
pling scheme of the data collection (von Eye 


and Schuster, 1998; von Eye, Schuster and 


Gutiérrez-Pefia, 2000). The typical sampling 
schemes are multinomial and_ product- 
multinomial. Mixed schemes are applied also. 
If sampling is multinomial, cases are randomly 
assigned to cells of a table, and there are no 
constraints except that the sample size is given. 
If sampling is product-multinomial, marginal 
frequencies are fixed, and assignment has to 
proceed such that the marginal frequencies are 
reproduced. 

In most cases, sampling is multinomial and 
we obtain 


! R 
M! aia 


Py 6 ——_——-. 
(m, m,!...Mp!r=1 * 


. ,Mp|N,7,,. oe ,Tp)= 


with }°7, =1 and °m,=N. Because of the 
multinomial sampling scheme, m, is binomially 
distributed with 


N! 
m,!(N — m,)! 


P(m, |N, 7, ) = yee 


n™(1— a, 


To test hypotheses about individual cells, one 
can use the binomial distribution, and one 
obtains 

x ! 


Bue) =X Toy 


je a) 


with 0 < x < N. A number of alternative tests 
has been proposed. The most popular among 
these are the X? component test, the z test, and, 
for product-multinomial sampling, exact and 
approximate hypergeometric tests (Lehmacher, 
1981). The X* componenttest is where, as before, 
r goes over all cells of the cross-classification. 
This test comes with df = 1, and it is well 
known that /X2 = z, the standardized residual. 

Lehmacher’s asymptotic hypergeometric test 
can be described as follows. Considering that 
the denominator of X* tends to underesti- 
mate the variance of the normal distribution 
when the model describes the data well (in 
this case, 0? <1), Lehmacher (1981) derived the 
exact variance for Cell r 


(N —1)(p, — Pr) 


a, = Np,|(1=p; 
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where N is the sample size, p, is the probability 
of being assigned to Cell r, and p, is estimated 
to be 


s 1 d 

P= ape HOD 

where k; indexes the univariate marginal fre- 
quencies of variable J. The probability p, is esti- 
mated using the CFA base model. In standard 
CFA, under product-multinomial sampling, the 
base model is the model of variable indepen- 
dence, i.e., the main effect model. Lehmacher’s 
test statistic is 


Fi im Mm, — €, 
he 
Oo, 


This test statistic is asymptotically standard nor- 
mally distributed. Because p, > p,, Lehmacher’s 
z will always be larger than /X?, and the 
test has more power to classify configura- 
tions as constituting types or antitypes than 
/X2, To prevent non-conservative decisions, 
Kiichenhoff (1986) recommended using a con- 
tinuity correction. 

A number of theoretical and simulation stud- 
ies has been undertaken to identify the best 
performing among the many tests that have 
been proposed for CFA (e.g., Indurkhya and von 
Eye, 2000; von Eye and Mun, 2003; von Weber, 
Lautsch and von Eye, 2003a; von Weber, von 
Eye and Lautsch, 2004). The results of these 
comparisons are that 


= (m, ~~ ey" 
e 


xe 

fig 

1. Whenever possible, exact tests are to be pre- 
ferred. These are, for instance, the binomial 
and the exact hypergeometric tests. 

2. Of the asymptotic tests, none performs 
always the best. However, the X? compo- 
nent (and the z-) test, and the two pro- 
cedures proposed by Perli, Hommel, and 
Lehmacher (1985) and by von Weber et al. 
(2004) cover 90% of the best solutions. This 


applies to both keeping the a and reasonably 
low B-levels. 

3. When samples are small, antitypes are hard 
or impossible to detect. When samples 
are large, antitypes are more likely to be 
detected. This applies to all tests with the 
exception of Lehmacher’s test for very small 
tables, for which it detects equal numbers of 
types and antitypes. 


2.2 Protecting a 


The typical application of CFA is in an 
exploratory context. In this context, it is 
unknown whether types and antitypes will 
emerge, and where they may emerge. Therefore, 
all R cells in a table are examined. Clearly, the 
number of tests can be large when the size of 
a table is large. The factual significance level a 
can be guaranteed to be equal to the nominal 
level only for the first of a series of tests on the 
same sample. 

There are two reasons why, for a large num- 
ber of tests, the factual and the nominal lev- 
els a differ from each other. The first reason 
is the mutual dependence of multiple tests. 
The extreme case is a 2 x 2 table which is ana- 
lyzed under the main effect model. von Weber, 
Lautsch and von Eye (2003b) showed that, 
under these conditions, the results of all CFA 
tests after the first test are completely depen- 
dent upon the result of the first. When the 
size of the tables increases (both in number of 
dimensions and number of variable categories), 
this dependency becomes less pronounced. 
However, tests never become completely inde- 
pendent upon each other (see also Steiger, 
Shapiro and Browne, 1985). Because of this 
interdependency, the factual level a can repre- 
sent a severe underestimation of the nominal 
level a. 

The second reason, exacerbating the first, is 
that of capitalizing on chance. In the present 
context, this means that because researchers 
expose themselves repeatedly to the risk 
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(of size a) of committing an a-error, the prob- 
ability increases beyond a with each test. von 
Eye (2002) presents the following example. If 
a researcher examines each of the 27 cells ina 
3x33 table under the nominal level a, the 
probability of committing a Type I error three 
times is p= 0.1505, even if the tests are, in truth, 
independent. In other words, with probability 
0.1505, three of the 27 type/antitype decisions 
will be wrong. 

Because of the great danger of making wrong 
decisions in CFA, protection of a has become 
routine in CFA applications. For exploratory 
applications, Perli, Hommel, and Lehmacher 
(1985) proposed protecting the local level a. This 
method guarantees that, for each cell-wise test, 
the factual a does not exceed the nominal level a. 
However, protection of « comes with a price: the 
decisions concerning the cell-wise null hypothe- 
ses become far more conservative. In fact, unless 
samples are really large, types and antitypes may 
become extremely hard to detect. 

A number of procedures to protect a has been 
proposed. The most popular procedure, known 
as Bonferroni adjustment, is also the most con- 
servative. Let the number of tests be R. Then, 
the Bonferroni procedure requires that two con- 
ditions be met. First, the sum of all individual 


a, does not exceed the nominal a,i.e., 5 a, <a, 


where r goes over all tests. Second, the proce- 
dure requires that all a, be equal, or a, = a*, 
for all r, where a* is the adjusted significance 
threshold. The adjusted threshold that meets 
both requirements is a* = a/R. 

To illustrate the effect of this adjustment, con- 
sider the example with the 3 x 3 x 3 table again. 
For 27 tests, the adjusted significance thresh- 
old is no longer a = 0.05. Instead, it is a* = 
0.05/27 = 0.00185. The z-score for the nominal 
a is +1.96. The z-score for the adjusted a* is 
+2.901. 

Because Bonferroni’s procedure is _ so 
extreme, a number of less conservative proce- 
dures has been proposed. Here, we review the 
procedures proposed by Holm (1979) and by 


Hommel (1988; 1989). Holm proposed to take 
into account the number of tests performed 
before the ith test. Taking this number into 
account yields a protected a* that becomes less 
conservative as the number of tests increases. 
Specifically, Holm’s protected significance 
threshold is 

: a 
or Fort 
where r is the number of the test currently per- 
formed, for r= 1,..., R. Because the number 
of tests performed before the rth is taken into 
account, the order of tests is no longer arbitrary. 
Therefore, the test statistics must be rearranged 
in descending order. Thus, the largest test statis- 
tic is considered first, the second largest after 
that, etc. When the probabilities are ordered 
that can be calculated for the exact tests, the 
order is ascending. The ordering requires addi- 
tional work. However, this amount of work is 
compensated in part by the fact that the testing 
of configurations concludes as soon as the first 
null hypothesis can be retained. 

To illustrate the less conservative nature 
of Holm’s procedure, we compare it with 
Bonferroni’s procedure. For the first test (r = 1), 
one obtains a} = foi = # That is, for the 
first test, the Holm and the Bonferroni proce- 
dures require the same threshold. Beginning 
with the second test, however, the Holm pro- 
cedure is less restrictive. For i = 2, one obtains 
a; = re =75> which is larger than a*. Holm’s 
thresholds become increasingly less restrictive 
until the Rth test, for which one obtains a; = 

For two- and _ three-dimensional tables, 
Holm’s procedure can be made even less restric- 
tive. Here, we present the results for three-way 
tables. The adjusted a-level is, for r=1,a} = 
a/R. As for Holm’s procedure, this value is the 
same as for Bonferroni’s procedure. However, 
for r= 2, 3, 4, and 5, one obtains aj =... =ai= 
a/(R—4). These values are less restrictive than 
the adjusted levels under the Bonferroni and 
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the Holm procedures. The remaining tests are 
performed under the same adjusted thresholds 
as under the Holm procedure. 


2.3. Base models for CFA 


In the context of modeling, researchers attempt 
to devise the best-fitting model. Large model- 
data discrepancies are used as indicators of 
poor fit, and as hints at ways for model improve- 
ment. Using CFA, researchers pursue differ- 
ent goals (Lehmacher, 2000; von Eye, 2002). 
Specifically, using CFA, researchers attempt to 
find types and antitypes that are interpretable 
with reference to a particular base model. The 
base model must be tailored such that a spe- 
cific interpretation is possible. Because of this 
goal, (1) the specification of a base model is 
neither trivial nor arbitrary, and (2) model- 
data discrepancies do not lead automatically 
to a modification of the model. Instead, they 
lead to interpretable types and antitypes. The 
two approaches to data analysis, modeling and 
search for types share the characteristic that 
model-data discrepancies are model-specific. 
Therefore, if the same configuration emerges as 
a type or antitype for more than one base model, 
it can have quite different interpretations. 

von Eye (2004) has proposed a taxonomy of 
CFA base models (see von Eye and Schuster, 
1998; von Eye et al., 2000). This taxonomy clas- 
sifies CFA base models into four groups: (1) 
log-linear base models; (2) models with config- 
ural probabilities based on known population 
parameters; (3) models with configural proba- 
bilities based on a priori considerations (these 
models are of particular importance in longitu- 
dinal research); and (4) base models that reflect 
assumptions concerning the joint distribution 
of the variables under study (von Eye and Bogat, 
2004; von Eye and Gardiner, 2004). Mixed base 
models have been discussed also. In the follow- 
ing sections, we review the four groups of base 
models. 

In general, a CFA base model includes all 
those variable relationships that are not of 


interest to the researcher. Types and _ anti- 
types can then emerge only if the rela- 
tionships of interest exist. There are three 
criteria for base models (von Eye and Schuster, 
1998). 


1. The base model must allow for unique 
interpretation of types and antitypes. Types 
and antitypes reflect discrepancies between 
model and data. A model qualifies as a CFA 
base model if there is only one reason for 
these discrepancies. Examples of such rea- 
sons include interactions, main effects, and 
predictor-criterion relationships. 

2. The base model must be parsimonious. That 
is, the base model must contain as few terms 
as possible, and these terms must be of the 
lowest possible order (for methods to reduce 
the complexity of base models, see Schuster 
and von Eye, 2000). 

3. The base model must consider the sam- 
pling scheme. When sampling is multi- 
nomial, practically every conceivable base 
model is possible. However, when sampling 
is product-multinomial, the fixed marginals 
must be reproduced. This applies in partic- 
ular when the sampling scheme is product- 
multinomial in more than one variable. 
In this case, the base model must repro- 
duce the marginal probabilities in the sub- 
tables that result from crossing variables 
with fixed margins. Base models that do 
not consider these subtables are, therefore, 
inadmissible. 


Examples follow in the next section. 


Log-linear base models for CFA 

Log-linear models (Agresti, 2002; Goodman, 
1984) allow one to describe the joint 
distribution, the dependency structure, and the 
association structure in a cross-classification of 
categorical variables. Log-linear models are the 
most popular base models of CFA. There are 
two groups of log-linear base models. The first 
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is the group of global base models. In these 
models, all variables have the same status. 
There is no distinction between dependent and 
independent variables, predictors and criteria, 
or even separate groups of variables. 

There is a hierarchy of global models. Models 
are higher in the hierarchy as the complexity of 
variable relationships increases, which the base 
model takes into account. In ascending order, 
the simplest base model is the null model. 
It requires multinomial sampling. This model 
assumes that no effects exist. The log-linear for- 
mulation is this log m=\, i.e., contains only 
the constant term. Types and antitypes reflect 
the existence of any effect. However, the nature 
of these effects is, without further analysis, not 
determined. 

The next-higher model is that of variable 
main effects. This model is the independence 
model, also called model of first-order CFA. 
As the name indicates, this model takes main 
effects into account. Types and antitypes thus 
can emerge only if the variables interact. 

One level higher in the hierarchy is the model 
of first-order interaction, also called model of 
second-order CFA. This is a hierarchical model 
in the sense that it also takes into account the 
main effects of all variables. Types and anti- 
types can emerge only if second- or higher-order 
interactions exist. Higher-order models can be 
considered. However, to the best of our knowl- 
edge, there have been no applications of higher- 
order models. 

The second group of log-linear CFA base 
models contains regional base models. In con- 
trast to global models, regional models are used 
to analyze groups of variables. These groups 
can be predictors and criteria, or simply two 
or more groups of variables that are related to 
each other. In longitudinal research, the groups 
of variables may be observed at different points 
in time, and researchers ask whether patterns of 
categories at the first point in time allow one to 
predict patterns of categories at the next point 
in time. 


Base models that use known population 
probabilities 

The typical application of log-linear models 
uses sample information to estimate cell prob- 
abilities. However, occasionally, population 
probabilities are known as, for instance, the 
incidence rates of cancer, the gender distribu- 
tion in a particular age bracket, the rate of car 
accidents in Michigan, or the number of chil- 
dren born to women in Italy. Whenever pop- 
ulation probabilities are known, one can ask 
whether a sample can be assumed drawn from 
this population. If there are significant discrep- 
ancies, one may ask where they are located, 
and this is the domain of CFA. Types indicate 
where more cases can be found than expected 
based on population parameters, and antitypes 
indicated where the number of cases is smaller 
than expected based on population parameters. 
In longitudinal research, population informa- 
tion is rarely available. Therefore, base models 
that reflect such information have not been used 
frequently. 


Base models based on a priori probabilities 

Of particular importance in longitudinal 
research is the possibility that the rate of change 
patterns does not only differ empirically but 
also theoretically, a priori. One frequently-used 
method of longitudinal CFA targets intraindi- 
vidual changes in a series of measures by creat- 
ing variables that indicate, the linear, quadratic, 
and higher-order (polynomial) trends. When 
data are continuous, linear trends can be exam- 
ined using the methods of first differences, i.e., 
differences between time-adjacent scores. Sec- 
ond differences, i.e., differences between first 
differences, are used to analyze quadratic trends 
(details follow below). Of these differences, 
often just the signs are analyzed to indicate 
whether there is an increase or decrease, or 
an acceleration or a deceleration. It has been 
shown (von Eye, 2002, Ch. 8) that patterns of 
signs of these differences come with different 
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a priori probabilities. CFA base models can take 
these into account. 


Base models that take distributional 
assumptions into account 
Recently (von Eye and Bogat, 2004; von Eye 
and Gardiner, 2004), a variant of CFA was pro- 
posed that allows one to examine sectors in 
the multivariate space for which the expected 
probability was estimated under the assump- 
tion of multinormality (Genz, 1992). Types (or 
antitypes) suggest that a sector contains more 
(or fewer) cases than expected based on the 
assumption of multinormality. Thus, CFA can 
be used as a method for testing the multi- 
normality assumption. The interesting aspect 
of this approach to testing for multinormality 
is that results indicate which sectors exactly 
show significant departures from multinormal- 
ity. If it can be assumed that samples do come 
from multinormal populations, one can ask why 
these sectors show discrepant probabilities, and 
whether selective resampling can be considered 
to make a sample more representative of the 
population it was drawn from. 

In the following sections, we present methods 
of CFA for the analysis of longitudinal data. We 
begin with the analysis of two-wave data. 


3 CFA of longitudinal data 


In this section, we present a selection of the 
many methods that have been proposed for 
the configural analysis of longitudinal data. We 
begin with a method that allows one to deter- 
mine whether temporal patterns of the same 
variable, observed over two or more occasions, 
are specific to one of two or more groups under 
study. The second approach to be presented 
allows one to analyze patterns of trends. The 
third approach focuses on symmetry tables. 


3.1 CFA of group-specific temporal patterns 


Consider the case in which a categorical 
variable was observed repeatedly in two or 


more groups of respondents. We then can 
ask whether the transition patterns are group- 
specific. In the simplest case, we have a 
dichotomous variable that was observed twice. 
Let the categories of this variable be labeled 
with 1 and 2. From two categories, four tran- 
sition patterns result: 11, 12, 21, and 22. Still 
dealing with the simplest case, we observe these 
transition patterns in two groups of respon- 
dents. Crossing the group variable with the tran- 
sition patterns yields the eight patterns 111, 
112, 121, 122, 211, 212, 221, and 222. Let 
the grouping variable be the last of the three 
variables in the crossing patterns. The first 
of these eight configurations describes those 
respondents who show Category 1 on both occa- 
sions, and these respondents are from Group 
1. Accordingly, Configuration 212 describes 
those respondents who show Category 2 at the 
first occasion, Category 1 at the second occa- 
sion, and are members of the second group. 
Table 20.1 illustrates this 2 x 2 x 2 arrangement. 

For the following considerations, we assume 
that no population parameters are known and 
no a priori probabilities exist. In addition, we 
assume multinomial sampling. Now, the ques- 
tion as to whether transition patterns are group- 
specific can be answered in more than one 
way. The routine way would involve calculat- 
ing odds ratios (see Rudas, 1998). A first odds 
ratio is 


— Pini/Pi2 
' Pit2/Pi22 


Table 20.1 2x 2 x 2 Cross-classification for 
comparison of transition patterns in two groups 


Group 1 Group 2 

Time 2 Time 2 
Time 1 My, My, Timel my. My 
M94, IMNg21 IMy12 M1222 
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where p;;, indicates the probabilities of the eight 
patterns and the grouping variable is the last in 
the subscripts. This odds ratio tells us whether 
the transition from 11 to 12 is observed at a dif- 
ferent rate in Group 1 than in Group 2. The null 
hypothesis according to which the logarithm of 
the odds ratio (the log odds) is equal to zero 
(even odds), i.e., that the transition rates from 
11 to 12 are the same in the two groups, can be 
tested using the standard normal 


logé 


Ziog0 = i 05 
1 
(x =) 


for i= 1, and j, k = {1,2}. 

Accordingly, we can ask whether the transi- 
tion from 21 to 22 is the same in the two groups, 
and calculate 


9, = Poi /Po21 
: Poi2/P222 


The significance test applies accordingly. In a 
follow-up step, we can compare these two odds 
ratios (or these two log odds) by calculating 
0 = 6, /6,. This ratio tells us whether the group 
differences in transition rates are the same for 
the transition from Category 11 to 12 as for the 
transition from Category 21 to 22. The signifi- 
cance test statistic for log 0 is 


logQ, 


Zlog a= FA 0.5 
1 
(Ex) 


for i, j, k= {1,2}. 

There can be no doubt that these odds ratios 
and the ratio of the two odds ratios allow 
us to analyze interesting and important ques- 
tions. We learn whether transition patterns dif- 
fer across groups and whether these differences 
carry over to other transitions. In addition, 
odds ratio analysis has important characteris- 
tics. For example, odds ratios are marginal-free. 
That is, odds ratios are not affected by extreme 


marginal distributions, as long as the ratios stay 
unchanged. 

However, CFA allows one to ask additional 
questions. Specifically, CFA allows one to ask 
whether the groups differ in individual pat- 
terns. For example, we can ask whether the 
two groups in the present example differ in the 
rates in which they show transition pattern 12. 
To answer this question (and the correspond- 
ing ones, for the remaining four transition pat- 
terns), we need to develop a base model. This 
model must be specified such that types and 
antitypes can emerge only if the groups differ 
in the rates with which they exhibit a transition 
pattern. In other words, the correlation between 
the scores observed at the first and the second 
occasions must not lead to types and antitypes. 
In addition, the occurrence rates of the variable 
categories must not lead to types and antitypes 
either. 

The log-linear base model with these charac- 
teristics takes into account the following effects: 


| Main effects of all three variables (Time 1, 
Time 2, and Group); taking the main effects 
into account has the effect that differences 
in occurrence rates of variable categories and 
differences in group sizes do not lead to the 
emergence of types and antitypes. 

© Interaction between Time 1 and Time 2; 
taking this interaction into account has the 
effect that autocorrelations will not lead to 
types and antitypes. This applies accordingly 
if data from more than two occasions are 
analyzed. 


In contrast, none of the interactions of the 
grouping variable and the two observation 
points is taken into account. If one or more of 
these interactions exist, they will manifest in 
the form of types and antitypes. In brief, the 
hierarchical log-linear base model of choice is 


T1,T2 


logm=+h; +P 
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The first term on the right side of the equation 
is the constant. The second term is the inter- 
action between the two observation points, T1 
and T2. Because the model is hierarchical, the 
main effects of T1 and T2 are implied. The last 
term is the main effect of the grouping variable, 
G. In bracket notation, this model is [T1, T2][G]. 
In this model, only the interactions between 
G and the two observation points are missing. 
These are reflected in the terms [T1, G], [T2, Gl], 
and [T1, T2, G]. If these interactions exist, the 
base model is rejected and types and antitypes 
exist. It is important to note that the CFA base 
model does take the main effects into account. 
It is thus marginal-dependent. When data from 
more than two occasions are analyzed, the base 
model takes the highest possible interaction 
among observation points into account. For 
example, for the four data waves T1, T2, T3, 
and T4, the base model is [T1, T2, T3, T4]. 

Clearly, the only effects that, for the above 
base model, can lead to the emergence of types 
and antitypes, are those that link T1 and T2 
with G. These are the effects [T1, G], [T2, G], and 
[T1, T2, G]. There is no other way to contradict 
this base model, and the first, the most impor- 
tant condition for CFA base models is fulfilled. 
If the base model is rejected, we can examine 
cells for local model-data discrepancies. 

In the current context, we will proceed in 
a slightly different way than inspecting each 
individual cell. Instead, we inspect pairs of 
cells. This procedure allows us to compare the 
two groups of respondents. We thus perform a 
2-group repeated measures CFA — a model not 
discussed thus far in the CFA literature (see von 
Eye, 2002). 


Elements of 2-group CFA 

The function of the above log-linear model is 
to estimate the expected cell frequencies. These 
will, under the null hypothesis discussed in 
the introduction, be compared with the cor- 
responding observed frequencies. In 2-group 
CFA, this is done after pairs of cells are arranged 


in the format of a 2 x 2 cross-tabulation. The 
columns of this table are constituted by the 
two groups under study. One of the rows con- 
tains the two observed frequencies of the tran- 
sition pattern used to compare the two groups, 
the other row contains the observed frequen- 
cies summed over the remaining transition pat- 
terns. Thus, each comparison is performed in 
the context of the whole table. As was indicated 
above, the expected cell frequencies are esti- 
mated using a log-linear base model. Table 20.2 
illustrates this arrangement (see von Eye, 2002, 
p- 175). 

Cross-classifications as the one depicted in 
Table 20.2 can be analyzed using, for instance, 
the exact Fisher test, odds ratios, the X?-test, 
or a number of z-approximations. When the 
test suggests that the null hypothesis can be 
rejected, there is a relationship between group 
membership and transition pattern or, in other 
words, the transition pattern is group-specific. 
If a pattern is group-specific, it is said to 
constitute a discrimination type. 


Data example 

For the following example, we use data from 
a study on the development of aggression in 
adolescents (Finkelstein, von Eye and Preece, 
1994). In this study, 38 boys and 76 girls in 
the UK were asked to respond to an aggres- 
sion questionnaire in 1983, 1985, and 1987. 


Table 20.2 2 x 2 table for 2-group CFA 


Configuration Group Row totals 
G1 G2 

Comparison Myc Myce my, 

pattern ij 

All others Mg, — Myo, Mg2—Mygg N-my. 

combined 

Column M1 MM. G2 N 

totals 
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The average age at 1983 was 11 years. One 
of the dimensions of aggression examined 
in this study was verbal aggression against 
adults (VAAA). In the present example, we ask 
whether the development of VAAA from 1983 
to 1985 is gender-specific. 

To analyze the data, we dichotomized the 
two measures of verbal aggression, VAAA83 
and VAAA85 at the median, with 1 indi- 
cating below and 2 indicating above average 
scores, and crossed the dichotomized variables 
with gender (1 = males, 2 = females). This 
resulted in the VAAA83 x VAAA85 x Gender 
cross-classification. The expected frequencies 
were estimated using the log-linear base model 


logm=)+ pe eeeeaaes +n 
For the individual tests, we used the z- 
approximation of the binomial test, and a was 
protected using the Bonferroni procedure. The 
adjusted a was a* = 0.0125. Table 20.3 displays 
the results of 2-group CFA. 

The results in Table 20.3 show that the first 
three transition patterns (from 1 to 1, from 1 to 
2, and from 2 to 1) contain slightly more male 
adolescents than one would expect from the 1:2 
ratio in the sample. Each of these transition pat- 
terns contains at least one indication of below 
average verbal aggression. However, none of 


these three differences is strong enough to allow 
for a significant discrimination between female 
and male respondents. However, transition pat- 
tern 22 does indicate a significant gender 
difference in transition pattern. It constitutes 
a discrimination type, indicating that signif- 
icantly more female than male respondents 
considered themselves above average in verbal 
aggression at both ages 11 and 13 years. 

This result cannot be reproduced using odds 
ratio or log-linear analysis. The hierarchical log- 
linear model [VAAA83, VAAA85][G, VAAA85] 
explains the data well (LR-X*=2.75;df=2; 
p=0.25). Compared to the above base model, 
only the second term was added. However, 
this model only states that the auto-association 
between the two verbal aggression scores is 
significant, and that the second verbal aggres- 
sion measure is associated with Gender. The 
model does not allow one to talk about 
transition patterns at the level of individual 
patterns and their relationship with the classi- 
fication variable, Gender. Therefore, CFA and 
log-linear modeling can be seen as comple- 
menting each other, while answering different 
research questions. 


3.2 CFA of patterns of trends 


Differences between time-adjacent observations 
tell us whether a later observed score is greater 


Table 20.3 2-group analysis of patterns of the development 


of verbal aggression 


Configuration m,, Statistic Pp Type? 

111 15 

112 22 1.132 .129 

121 8 

122 14 .336 .369 

211 9 

212 11 1.219 111 

221 6 

222 29 —2.441 .007  ~—‘ Discrimination 


type 
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than an earlier observed score. More specifi- 
cally, we calculate Ay; = y;,,—y;. If Ay; > 0, then 
the score at Time i+1 is greater than the score 
at Time /. Scores Ay; are called first differences. 
Positive first differences indicate an increase in 
scores over time. 

Accordingly, second differences can be 
defined. Given the first differences, Ay,;, one 
can calculate A’?y, = Ay,,, —Ay,. Second dif- 
ferences indicate whether the trend in the first 
differences changes over time. That is, second 
differences tell us whether the first differences 
are accelerated or decelerated over time. Curva- 
ture can be examined by calculating third and 
higher-order differences. 

The method of differences has a number of 
important characteristics: 


1. The data analyzed with this method must be 
at the interval or ratio scale levels. 

2. First differences describe linear trend (for the 
use of differencing methods in time series 
analysis, see e.g. Chapters 34 and 36 in this 
volume); in this characteristic, they are com- 
parable to first-order polynomials. 

3. Second differences describe changes in the 
linear trend; in this characteristic, they are 
comparable to quadratic polynomials; this 
applies accordingly to higher-order differ- 
ences and polynomials. 

4. If the first differences are constant, the series 
of scores can be described by a linear regres- 
sion line. 

5. If the first differences vary, but the sec- 
ond differences are constant, the series of 
scores can be described by a quadratic func- 
tion; this applies accordingly to higher-order 
differences. 

6. Analyzing first differences reduces the num- 
ber of available scores in the series by one; 
analyzing second differences reduces the 
number of available scores by two, etc. 

. Data points must be equidistant. 

8. For k repeated observations, testing change 

parameters for polynomials of order up to 
k—1 becomes possible. 


N 


The method of differences just described 
is called method of descending differences if 
later observations are subtracted from earlier 
differences. The method of ascending differ- 
ences involves subtracting earlier observations 
from later ones. The method of central differ- 
ences subtracts scores from a common reference 
point, typically the mean or median. 


Data example 

In the following application example, we use 
the data from the Finkelstein et al. (1994) 
study again. In addition to verbal aggression, 
aggressive impulses were assessed. We now 
ask whether there are types or antitypes 
of change patterns in linear and quadratic 
trends in aggressive impulses. In a first 
step, we calculate the difference scores. We 
obtain the variables DAI1 = AI85 — AI83, 
DAIZ = AI87 — AI85, and DAI2ZND = 
(AI87 — AI85) — (AI85 — AI83), where AI83, 
AI85, and AI87 are the aggressive impulse 
scores for the years 1983, 1985, and 1987, 
respectively. 

The reason why we include the quadratic 
trend in addition to the linear one is that, at 
least for some of the adolescents, the changes 
in aggressive impulses cannot be satisfactorily 
described by only linear trends. Figure 20.1 
shows all 114 trajectories. Clearly, some of the 
trajectories are U-shaped, whereas others are 
inversely U-shaped. 

In the next step, the first and second differ- 
ences were dichotomized at the zero point, with 
2 indicating positive differences and 1 indicat- 
ing negative differences. The resulting two val- 
ues for the first differences between 1985 and 
1983 and between 1987 and 1985 thus indicate 
a positive linear trend for a score of 2 and a neg- 
ative linear trend for a score of 1. Accordingly, 
the resulting values for the quadratic trend indi- 
cate a U-shaped trend for a score of 2 (accel- 
eration) and an inversely U-shaped trend for a 
score of 1 (deceleration). 

Using CFA, we now ask whether particular 
patterns of first and second differences occur 
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Figure 20.1 Parallel coordinate display for 
self-perceived aggressive impulses in the years 
1983, 1985, and 1987 


more often or less often than expected. To 
specify expectation, we use the base model of 
variable independence, i.e., 


logm=)+ ee are aa 


From this base model, types and antitypes can 
emerge if any of the pairwise or three-way inter- 
actions exist. We adjust a using the Bonferroni 
procedure which results in a* = 0.00625, and 
we use the z-test. Table 20.4 displays results of 
this first-order CFA. 


The results in Table 20.4 suggest that 2 
types and 2 antitypes exist. The first type 
is constituted by Configuration 122 (see also 
Figure 20.2). This pattern is characterized by a 
decrease in aggressive impulses from 1983 to 
1985 that is followed by an increase. This pat- 
tern is complemented by a U-shaped quadratic 
trend; 26 respondents displayed this pattern, 
but only about 9 had been expected. The sec- 
ond type (Configuration 211) shows the oppo- 
site pattern. An increase in aggressive impulses 
is followed by a decrease. This pattern is com- 
plemented by an inversely U-shaped quadratic 
trend. 36 adolescents were observed to dis- 
play this pattern — about 1.5 times as many as 
expected. 

The antitypes are constituted by Config- 
urations 121 and 212. These are impossi- 
ble patterns! The first shows a decrease 
in aggressive impulses that is followed by 
an increase, complemented by an inversely 
U-shaped quadratic trend. This pattern is con- 
tradictory. Therefore, validly, the program did 
not find anyone showing this pattern. This 
applies accordingly to Configuration 212 which 
describes an increase, followed by a decrease, 
complemented by a U-shaped quadratic trend. 
Figure 20.2 shows the temporal curves for the 
six possible patterns. 


Table 20.4 First-order CFA of first and second differences of 3 waves of 


data on aggressive impulses 


Configuration® —s mix. Tix 

171 20 22.241 
112 13 13.470 
121 0 14.505 
122 26 8.785 
211 36 20.733 
212 0 12.557 
221 15 13.521 
222 4 8.189 


B P Type/Antitype? 
—.475 .3173 
—.128 4491 
—3.809 .0001 Antitype 
5.808 .0000 =Type 
3.353 .0003. Type 
—3.544 .0002 Antitype 
402 3438 
—1.464 .0716 


° The order of variables is DAI1, DAI2, and DAIZND 
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Figure 20.2 Observed aggressive impulse scores for six longitudinal configuration trends 


Now, the inspection of Table 20.4 shows that 
the base model estimated expected frequencies 
for the two impossible patterns. Only because of 
these expected frequencies, can these two con- 
figurations be said to constitute antitypes. It is, 


however, a mistake to estimate expected fre- 
quencies for cells that cannot contain any case, 
i.e., for cells with structural zeros. We therefore 
recalculate the CFA, taking the two structural 
zeros into account. The results are summarized 
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in Table 20.5. Please note that, because of the 
two structural zeros, the Bonferroni-adjusted a 
now is 0.008 instead of 0.006 as in Table 20.4. 

The log-linear base model for the results in 
Table 20.3 came with an overall goodness- 
of-fit LR-X? = 88.36 (df = 4; p < 0.01). The 
log-linear base model for the results in Table 
20.4 comes with an overall goodness-of-fit 
LR-X? = 25.39 (df = 2; p <0.01). The difference 
between these nested models, AX? = 62.97, is 
significant (A df = 2; p < 0.01). This result indi- 
cates that taking into account the structural 
zeros improves the model considerably. How- 
ever, the base model is still rejected. Therefore, 
we can expect types and antitypes to emerge. 

The two types from the first analysis still 
exist. They may be less pronounced as in Table 
20.4 (as seen by the decrease in the size of the 
raw residual), but they are still strong enough 
to be significant even when the conservative 
Bonferroni procedure is applied. 

Naturally, the next question is whether the 
non-empirical element in the Configurations 
122 and 211 prevents them from being inter- 
esting (for a discussion of non-empirical or 
a priori elements in empiricial research, see 
Brandtstadter, 1982; Smedslund, 1984). The 
answer to this question is no. CFA does not 
answer the question as to whether configura- 
tions are possible (Configurations 122 and 211 


are possible). Instead, CFA asks whether con- 
figurations are observed at a different rate than 
expected. The information about the quadratic 
trend is, in these two configurations, not infor- 
mative. Therefore, appending the “1” to the 
“12” and the “2” to the “21” does not alter 
the picture. The types therefore indicate that the 
trends “12” and “21” were observed more often 
than expected. However, a “1” or a “2” does 
carry additional information for the Configura- 
tions 11. and 22., where the period indicates 
that either of the shapes of the quadratic trend 
could be considered. 

We conclude that the present analysis con- 
tains three groups of cells. In the first, all vari- 
ables carry information. These are cells 111, 
112, 221, and 222. In the second group of cells, 
only two of the three variables carry informa- 
tion. These are cells 122 and 211. The cells in 
the first two groups are possible in the sense 
that these patterns can be observed. The third 
group of cells contains impossible patterns. 
This group comprises Cells 121 and 212. In the 
analysis, these cells have to be declared struc- 
tural zeros. 


3.3 CFA under consideration of a priori 
probabilities 


By far the majority of CFA applications esti- 
mates expected cell frequencies from the data. 


Table 20.5 First-order CFA of first and second differences of 3 waves of 
data on aggressive impulses, structural zeros taken into account 


Configuration® = mix. Mix 

111 20 27.886 
112 13 17.234 
121 0 0 
122 26 13.880 
211 36 23.880 
212 0 0 
221 15 19.234 
222 4 11.886 


Z P Type/Antitype? 
—1.493 .0854 
—1.020 .0783 
3.253 .0006 Type 
2.480  .0066 Type 
—.965 .1672 
—2.287 .0111 


° The order of variables is DAI1, DAI2, and DAIZND 
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However, there exist situations in which deter- 
mining expected cell frequencies from a priori 
probabilities is an option. CFA of differences 
is such a situation. For first differences, we 
illustrated this using the data situation in the 
last example. Consider a dataset from three 
points in time, where all data points are dif- 
ferent (for the situation in which data points 
can be unchanged, see von Eye, 2002). Let the 
scores be 1, 2, and 3. For these scores, the pos- 
sible sequences are given in Table 20.6, along 
with the resulting sign patterns, their frequen- 
cies and probabilities. 

Clearly, patterns +— and —+ come with a 
probability that is twice that of the probabilities 
of patterns + + and — —. In addition, these dif- 
ferences are not a matter of data characteristics. 
They are a priori and will be found in every 
dataset with three or more observation points. 
Researchers may wish to take the different a 
priori probabilities into account. Let the proba- 
bility of the jth sign pattern of the ith variable 
be 7; Then, the expected cell frequency for this 
transition pattern is estimated to be e; = 7 N. 
If the transition patterns for two variables are 
crossed, the expected cell frequencies can be 
estimated as €(; 1); = 7 Ty N, where j indicates 
the sign pattern of variable i and j’ indicates the 
sign pattern of variable k. 

To illustrate, we reanalyze the data in 
Table 20.4, taking into account the a priori 


probabilities of the four different transition pat- 
terns. Table 20.7 shows the results. 

The cell probabilities in Table 20.7 were 
calculated in three steps. First, the a_pri- 
ori probabilities were determined as described 
above. The same probabilities result as given 
in Table 20.6. Second, taking into account the 
specific data situation in Table 20.5 (struc- 
tural zeros), the probabilities for negative sec- 
ond differences were weighted with 59/114, 
and the probabilities with positive second dif- 
ferences were weighted with 55/114, to reflect 
the marginal probabilities of the two signs of the 
second differences. Finally, the structural zeros 
were blanked out, and, therefore, the probabil- 
ities of patterns 122 and 211 were multiplied 
by 2. 

To give an example, the a priori probability of 
Configuration 111 is 1/6. This is weighted with 
59/114 which yields ,,, = 0.16667 - 59/114 = 
0.0863. Multiplied by N = 114, we obtain the 
estimated expected cell frequency given in 
Table 20.7. 

The results in Table 20.7 show that there 
are no types and antitypes when a priori prob- 
abilities are taken into account. It is a typi- 
cal result that taking different information into 
account when estimating expected cell frequen- 
cies leads to different type and antitype pat- 
terns. Currently, the user has to make a decision 
as to whether to consider data characteristics 


Table 20.6 Sequences of differences from the scores 1, 2, and 3, their sign patterns, 


frequencies, and probabilities 


Sequences Differences between Sign pattern Frequency of Probability of pattern 
adjacent scores pattern 

123 ==—d =—— 1 .167 

132 =Z,.1 —+ 2 .333 

213 1.2 oo 2 .333 

231 —1,2 ht 

312 2,=1 +- 

321 1,1 el 1 .167 
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Table 20.7. First-order CFA of first and second differences of 3 waves of data on aggressive 
impulses, structural zeros and a priori probabilities taken into account 


Configuration® Mix cell prob. Mit w Pp Type/Antitype? 
111 20 .0863 8.16 3.242 .0006 
112 13 .0804 10.83 1.266 .1027 
121 0 0 - = 
122 26 .1608 18.33 1.791 .0367 
211 36 .1725 19.67 3.683 .0001 
212 0 0 - = 
221 15 .0863 9.83 1.648 .0497 
222 4 .0804 9.17 —1.706 0440 


or a priori probabilities when estimating cell 
probabilities. The currently available software 
does not allow one to take into account both. 


3.4 CFA of transitions from one point 
in time to the next 


When categorical variables are observed over 
time, the cells in the cross-classifications of the 
observed variables indicate exactly where an 
individual came from and where the individual 
went. For example, Pattern 13 indicates that an 
individual endorsed Category 1 at the first and 
Category 3 at the second point in time. Here, we 
consider two CFA base models that represent 
two sets of hypotheses. 


First-order CFA of transitions between two 
points in time 

The first model to be considered is the model of 
first-order CFA, i.e., the main effect model. As 
was indicated above, this is a log-linear main 
effect model. For the case of the repeatedly 
observed variable A, this is the model 


- A,)A 
logm=A+A; +); 


where i indicate the rows (categories endorsed 
at Time 1) and j indicates the columns 
(categories endorsed at Time 2) of the cross- 
classification. Alternatively, the expected cell 


frequencies can also be estimated using the well 
known X* formula 


Mm; M1 ; 
ec. = 
I N 


where ; and ; indicate the row sums and the 
column sums. It should be noted that, for the 
present kind of data analysis, it is not required 
that one variable is observed two or more times. 
The present variant of CFA can be performed 
even if two different variables are observed at 
the two points in time. To make this section eas- 
ier to compare with the following, we use the 
example of one repeatedly observed variable. 
In either case, resulting types (and antitypes) 
indicate that certain transitions are more (or 
less) likely than expected using the base model 
of independence between Time 1 and Time 2 
observations. 

When one variable is observed repeatedly, 
one would expect types to emerge for the main 
diagonal, and antitypes for the off-diagonal 
cells. This result would correspond to a strong 
autocorrelation. However, in many applica- 
tions, not all diagonal cells turn out to be types, 
and not all off-diagonal cells turn out to be anti- 
types. In different words, CFA allows one to 
determine which categories carry an autocorre- 
lation (if it exists at all). 
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Data example 

In the following example, we use the vaca- 
tion data that were analyzed by von Eye and 
Niedermeier (1999). At Time 1, a sample of 89 
children was asked where their families had 
spent their last vacations. Responses indicating 
that the families had spent their vacations at a 
beach, at an amusement park, or in the moun- 
tains were included in the analysis. At Time 
2, the same children were asked where they 
would like to spend their next family vacations. 
Again, beach, amusement park, and mountains 
were the options. Using first-order CFA, we 
now answer the question as to whether cer- 
tain transition patterns are particularly likely or 
particularly unlikely. To obtain the results in 
Table 20.8, we used the base model of variable 
independence, the z-test and the Bonferroni- 
adjusted a* = 0.0056. 

Table 20.8 shows that vacation preferences 
in children are stable. Each of the diagonal 
cells (Cells 11, 22, and 33) constitutes a type, 
indicating that children typically prefer repeat- 
ing a vacation over trying something new. Of 
the off-diagonal cells, only one constitutes an 
antitype, Cell 13. This cell indicates that it is 
rather unlikely that children who spent their 
last vacations at the beach will opt for the 


mountains as their next vacation place. Each of 
the off-diagonal cells was observed less often 
than expected. However, only pattern 13 con- 
stituted an antitype. 


4 CFA of symmetry patterns 


The analysis that was done to create the results 
on Table 20.8 was a routine first-order CFA. 
We now ask a different question. We ask 
whether specific transition patterns are more 
(or less) likely to be observed than their coun- 
terpart transitions in the opposite directions. 
To answer this question, we use the concept of 
axial symmetry (Lawal, 1993, 2001; von Eye and 
Spiel, 1996). Let p,; be the probability of Cell ij 
in a square contingency table. Then, this table 
is said to display axial symmetry if p,; = p;;, for 
i,j=1,..., I. To test the hypothesis that a table 
displays axial symmetry, the expected cell fre- 
quencies are determined in two steps (note that 
this model is not log-linear; therefore, standard 
categorical data modeling software can be used 
only if it allows for user-specified vectors of the 
design matrix). First, the cells in the main diag- 
onal are blanked out. They do not belong to the 
cells involved in the hypothesis. Second, for 


Table 20.8 First-order CFA of preferred vacations, observed at two points in time 


Configuration® — myx Mix Z Pp Type/Antitype? 
11 25 13.303 3.207 0007. ~—- Type 

12 10 13.303 —.906 .1826 

13 2 10.393 —2.603 .0046 Antitype 

21 3 8.270 —1.832 .0334 

22 19 8.270 3.731 .0001 Type 

23 1 6.461 —2.148 .0158 

31 4 10.427 —1.990 .0233 

32 3 10.427 —2.300 .0107 

33 22 8.146 4.854 0000 = Type 
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Table 20.9 First-order CFA of preferred vacations, observed at two 


points in time 


Configuration® Myx Myx Z P Type/Antitype? 
11 25 = 

12 10 6.5 1.88 .170 

13 2 3.0 33 .064 

21 3 6.5 1.89 170 

22 19 = 

23 1 2.0 50 .480 

31 4 3.0 33 .064 

32 3 2.0 50 .480 

33 22 = 


the off-diagonal cells, the expected frequencies 
are estimated, for each pair jj, as 


qd ji 2 


This estimation has the effect that the pairs of 
cells that are mirrored at the main diagonal con- 
tain the same estimated expected cell frequen- 
cies. These are the configurations that indicate 
transitions in the opposite directions. 

For use in CFA, estimation proceeds as 
described. The CFA tests proceed as usual. Only 
the protection of a differs from the usual pro- 
cedure. The number of tests is smaller than in 
standard applications of CFA. The number of 
symmetry pairs in a square table is () = ne ) 
where J is the number of rows and columns in the 
table. This number is smaller than I?. The pro- 
tection of a is thus less restrictive than in stan- 
dard CFA. Table 20.9 displays the reanalysis of 
the data from Table 20.8 under the hypothesis 
of axial symmetry. We use the X?-test and the 
Bonferroni-adjusted a* = 0.05/3 = 0.0167. 

Table 20.9 shows that there is not a sin- 
gle symmetry pair that violates the symmetry 
assumptions. We conclude that the probability 
of switching from vacation preference i to j is 
the same as the probability of switching from j 
to i. As was shown in the previous section, the 


activity in this matrix can be described using 
the method of first-order CFA. 


5 Discussion 


In this chapter, we described a small number 
of the possibilities that configural frequency 
analysis (CFA) offers for the analysis of long- 
itudinal data. Specifically, we looked at the 
analysis of transition patterns, trends, and 
symmetry. We also discussed the comparison 
of trends across two populations. These and 
all other configural approaches to the anal- 
ysis of cross-classification share the person- 
oriented perspective (Bergman and Magnusson, 
1997; von Eye and Bergman, 2003). Under this 
perspective, data are analyzed with the goal 
of making statements about people. The cur- 
rently dominating perspective leads to state- 
ments about variables and their relationships. 
For example, growth curve modeling (Bollen 
and Curran, 2006; see also Stoolmiller, Chapter 
32, in this volume) aim at describing relation- 
ships among variables. In contrast, configural 
frequency analysis aims at detecting salient pat- 
terns of change in people. 

Therefore, the types and antitypes that result 
from configural analysis play a different role 
than the residuals in models of categorical 
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data. Large residuals indicate where a particular 
model does not describe the data well. Mod- 
els are usually expressed in terms of relations 
among variables. Thus, large residuals indicate 
which of the proposed variable relationships 
fail to describe the data. The implication of 
large residuals is that the model must either 
be rejected or modified such that the cor- 
respondence with the data becomes closer. 
In contrast, types and antitypes contradict a 
base model that is of substantive interest only 
because it takes those variable relationships 
into account that are not important. If types 
and antitypes emerge, the relationships that are 
deemed important must exist. However, results 
are not expressed in terms of these relation- 
ships. Instead, results are expressed in terms of 
the profiles (configurations) of those individu- 
als who were observed at rates contradicting the 
base model. In addition, instead of altering the 
base model, researchers then attempt an inter- 
pretation of the types and antitypes. 

This procedure will not prevent researchers 
from employing various CFA base models. How- 
ever, again, using several CFA base models on 
the same data does not mean that a base model 
is dismissed. Rather, it implies that the data are 
approached with different questions. For exam- 
ple, instead of performing a two-group analysis 
in our first data example, we could have per- 
formed a standard first-order CFA. This alterna- 
tive method of analysis may have led to types 
and antitypes also. However, these types would 
have to be interpreted individually. A gender 
comparison would not have been possible. 

Many more methods of longitudinal CFA 
have been discussed. These methods involve, 
for example, creating different kinds of differ- 
ences, the analysis of pre-post designs, CFA of 
changes in location, CFA of both trends and 
shifts in means, CFA of series that differ in 
length, CFA of control group designs, and the 
analysis of correlational patterns over time (for 
a description of these and other methods of 
longitudinal CFA, see von Eye, 2002). The large 


array of CFA methods for the analysis of longi- 
tudinal data shows that the number of questions 
that can be answered using CFA, i.e., the num- 
ber of questions that can be approached from a 
person-oriented perspective, is equally large. 
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| Chapter 21 I 


Analysis of longitudinal categorical data 
using optimal scaling techniques 
Catrien C. J. H. Bijleveld 


1 Introduction 


Optimal scaling techniques can be used flex- 
ibly for the analysis of mixed measurement 
level longitudinal data. After giving some basic 
technical background, it will be shown and 
illustrated how, by using a special set up of 
the data matrix, developmental patterns can 
be explored, and how various types of growth 
curves may be modeled. Ending with a real- 
life example from criminological research, it is 
shown how solutions can be interpreted and 
how development for subgroups of respondents 
can be visualized. One advantage of the tech- 
niques discussed is that they can handle miss- 
ing values and missing occasions flexibly and 
that they are not burdened by temporal depen- 
dence the way ordinary statistical techniques 
are; a disadvantage is that criteria for interpre- 
tation are fuzzy, and that no statistical tests are 
provided. 

This chapter discusses the exploratory analy- 
sis of longitudinal data. In doing so, it will focus 
on techniques that have essentially been devel- 
oped for the multivariate analysis of mixed 
measurement level variables. Examples are the 
analysis of the association between marital sta- 
tus, profession, and income, or between type 
of therapy, psychological complaints, and ther- 
apist attachment. In these examples, at least 


one variable has been measured at less than 
interval level, and is noncontinuous: the mea- 
surement scale of these variables consists of 
a set of categories. Categorical variables parti- 
tion the subjects into different categories, like 
“marital status” (married, divorced, single, wid- 
owed) or “offence” (property, violence, public 
order, other). In the literature such variables 
are also referred to as “ordered categorical” and 
“interval categorical”, or “discrete interval”; see 
Agresti (1990). 

The techniques developed for the analysis of 
such data that will be discussed here are gen- 
erally referred to as optimal scaling techniques, 
although there is more they have in common 
than their algorithm. Since the first publications 
on this type of technique, they have become 
much more easy to use in software packages 
such as SPSS, and to a lesser extent SAS. 

This easy availability of these techniques has 
not made for their widespread use, however. 
The optimal scaling techniques, while very 
flexible and accessible even for a nonstatisti- 
cal audience, have remained somewhat esoteric 
and have not yet become as widely used as sim- 
ilarly “novel” techniques, such as multilevel 
analysis. 

Optimal scaling techniques serve essentially 
as a tool for interpretation, for structuring the 
data. No distributional assumptions are made. 
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It is exactly this property which makes opti- 
mal scaling techniques so eminently useful 
for the analysis of patterns of change. The 
fact that there is temporal dependence in the 
data and that standard statistical tests cannot 
be performed, does not hamper these tech- 
niques. Because of their flexibility, they make 
it possible to essentially do-it-yourself model 
all kinds of growth curves, or dependencies, 
and assess these models as to their explana- 
tory contribution to theory, or as to exploration 
itself. Lastly, optimal scaling techniques have 
efficient options to deal with missing values. 
Dropout or attrition is a serious problem in 
many surveys, and is often especially problem- 
atic in longitudinal research, as dropout is, with 
increasing time points incremental, and as data 
from respondents lost to follow up are virtu- 
ally worthless if the analysis technique requires 
a complete series of measurements. As will be 
shown below, optimal scaling techniques are 
also very flexible in this respect, and make it 
possible to make maximum use of the data as 
they have been collected. For that reason as 
well, they are thus a commendable option for 
the analysis of longitudinal data. 

The fact that optimal scaling techniques do 
not make distributional assumptions is on the 
one hand an advantage, but may on the other 
hand also be a disadvantage. While fit measures 
are provided, no statistical test is provided for 
the goodness of fit of the model. Permutational 
methods such as bootstrapping or jackknifing 
may be used to arrive at such a measure, but this 
is cumbersome. This is probably, in part, the 
reason that these techniques have not become 
part of the standard toolkit of the typical (social) 
scientist. A second reason may be that for inter- 
pretation no hard criteria are given such as rota- 
tion criteria in factor analysis, although general 
guidelines and rules of thumb are available. 

In the following, we will first briefly discuss 
the optimal scaling techniques. Such a discus- 
sion can be cursory only, and the reader is 
referred to standard works such as De Leeuw 


(1983), De Leeuw (1989), Gifi (1990), Van 
de Geer (1993), Greenacre (1984), Greenacre 
(1993), Nishisato (1994) and Van der Heijden 
(1987) for more detail and depth. Next, we will 
discuss two types of techniques, namely one- 
set analysis and multiset analysis. Next, we will 
show how these can be used to investigate long- 
itudinal data in general, and to model growth 
more particularly. While we give small artificial 
examples throughout, we end with a real-life 
example in which we investigate the develop- 
ment of norm-transgressing behavior for three 
cohorts of secondary school students over three 
waves. We end with a number of less eas- 
ily accessible extensions to these techniques. 
In our descriptions, we at times refer back to 
Bijleveld and Van den Burg (1998), who pre- 
sented a more extensive and detailed version of 
the reasoning in this chapter. 


2 Optimal scaling 


The various techniques in this chapter have 
in common that they give different values to 
existing category values. This process is called 
quantification or optimal scaling. It is opti- 
mal because a certain criterion is optimized. 
After the variables have been quantified they 
are treated as if they were continuous. 

The optimal scaling techniques use an alter- 
nating least squares (ALS) algorithm to arrive 
at a solution. This is not essential, as the same 
analyses can be performed using alternating 
maximum likelihood estimation as well (see De 
Leeuw, 2006a); also, more sophisticated opti- 
mization algorithms have recently been pro- 
posed (De Leeuw, 2005; 2006b). ALS algorithms 
consist of at least two alternating steps. In each 
step, a loss function is minimized. Alternating 
between the steps, the algorithm converges to 
one solution for the parameters of the (linear) 
analysis model as well as for the new values for 
the categories of the categorical variables. 

We distinguish between nominal, ordinal, 
and numerical variables. For nominal variables 
the values of the categories reflect similarity 
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and dissimilarity. When quantifying the cate- 
gories of these variables, this information must 
be preserved. For ordinal variables, the val- 
ues of the categories have no other information 
content than the partitioning and the ordering. 
When rescaling, the ordering must thus be pre- 
served, meaning that (weak) monotone transfor- 
mations are allowed. For numerical variables, 
essentially interval level variables, only linear 
transformations are permitted, as in ordinary 
multivariate analysis techniques. 

As stated, two kinds of models can be ana- 
lyzed using optimal scaling techniques. The 
first kind of model analyzes the relations 
between the variables all belonging to one set. 
Linear analogs of such models are principal 
components analysis, or factor analysis. We 
refer to its nonlinear analog as multiple cor- 
respondence analysis; we treat its extension, 
nonlinear principal components analysis, cur- 
sorily. The second type analyzes the relations 
between the variables distinguished into two or 
more sets. The best example of such a linear 
model is canonical correlation analysis, and its 
nonlinear parallel is called—unsurprisingly— 
nonlinear canonical correlation analysis, or in 
the case of more than two sets of variables, non- 
linear generalized canonical analysis. 


2.1 Multiple correspondence analysis 


Multiple correspondence analysis investigates 
the association between various categorical 
variables simultaneously. For investigating the 
bivariate association between two nominal vari- 
ables, cross-tabulations are generally used. If we 
are investigating M variables, we are then how- 
ever faced with the task of inspecting M(M-1)/2 
cross-tabulations of bivariate relations. How- 
ever, if we are interested in the multivari- 
ate associations amongst the various variables, 
all such bivariate inspections are tedious and 
essentially uninformative, and, as we are inter- 
ested in the multivariate association, we would 
like to summarize the most pertinent multivari- 
ate relations in the data. 


Multiple correspondence analysis is a useful 
technique for doing so. In multiple correspon- 
dence analysis, the categories of the nominal 
variables are quantified in such a way that 
the correlation or similarity between all quanti- 
fied variables is maximal. This is the criterion 
against which the quantification is optimized. 

A nominal variable is characterized by its so- 
called indicator matrix, the matrix of dummy 
variables that show to which categories a 
respondent belongs (1) and to which categories 
he or she does not belong (0). Suppose that we 
have measured N respondents on M categori- 
cal variables. The indicator matrix of variable j 
with k; categories is denoted by the matrix Jj, 
which has size N x k; (in the Gifi-system I; is 
generally referred to as G,). The quantifications 
of the categories are stacked in a vector b,(of 
length k,). Then the expression Ib; (resulting 
in a vector of length N) gives the transformed 
or quantified variable. This matrix product thus 
contains the k, quantifications of the categories 
of the variables. 

As said, correspondence needs to be 
maximized between the variables. Maxi- 
mum similarity or maximum homogeneity is 
attained when all quantified variables are— 
simultaneously—as similar as possible. This is 
achieved by next introducing a vector of so- 
called object scores, that contain scores for 
the N respondents. This vector is referred 
to as z, and has length N. By maximizing 
the similarity between the quantified variables 
and these object scores, the correspondence 
between the variables is maximized. In formula 
this is understood as minimizing the differ- 
ence between [b; and z, which means that for 
achieving maximum similarity the following 
loss function needs to be minimized: 


M 
o(z,b,,...,by) = >) SSQ(z-I;b;) (1) 


j=l 


where SSQ (.) stands for the sum of squares of 
the elements of a vector, with elements writ- 
ten in deviation from the column mean, so 
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that SSQ (.) corresponds with the usual sum of 
squares notation. 

Of course, (1) can be solved trivially by set- 
ting all elements of z equal to zero as well as 
all elements of b;. Therefore some kind of stan- 
dardization has to be imposed: in this case it is 
requested that z’z = N. Also, the object scores 
are generally written in deviation from their 
mean. Solving expression (1) under these condi- 
tions gives a solution in which each respondent 
is assigned one object score, and each cate- 
gory one quantification. As in comparable tech- 
niques, such as principal components analysis, 
higher-dimensional solutions can be reached 
as well. To do so, for each subject, a point 
in p-dimensional space is found, and, for each 
category, p quantifications are found. This is 
achieved by extending the N-dimensional vec- 
tor z to an (N x p) matrix Z (with Z’Z=D), and 
by combining the p category quantifications for 
each category of each variable in M respective 
(k x p)-dimensional matrices B,. The loss func- 
tion is extended similarly and optimization oth- 
erwise is the same. 

The dimensions thus found are not only 
orthogonal, but they are also nested. By this 
we mean that the dimensions found are inde- 
pendent of the dimensionality of the solution 
sought: whether one seeks a solution in 2, 
3 or 4 dimensions, the lower dimensions are 
always identical, independent of any higher 
dimensions modeled. In other words: the first p 
dimensions of a p+1, p+ 2, p+3 etc. solution 
are stable. 

Multiple correspondence analysis is also 
known as homogeneity analysis (see Gifi, 1990; 
Greenacre, 1984; Nishisato, 1980; Tenenhaus 
and Young, 1985; Van de Geer, 1993). In case 
of only two variables, multiple correspondence 
analysis is commonly referred to as correspon- 
dence analysis (see Benzécri, 1973). Multi- 
ple correspondence or homogeneity analysis is 
available in the computer program HOMALS 
(short for HOMogeneity analysis through Alter- 
nating Least Squares), in the SPSS procedure 


CATEGORIES (SPSS, 2006). Other programs 
also perform multiple correspondence analysis, 
for instance the SAS procedure CORRESP (see 
SAS, 2006a). 

A number of guidelines exists for interpret- 
ing solutions. A first measure of these is the 
loss, or the badness-of-fit, with a goodness-of- 
fit measure the number of dimensions minus 
the loss. The total goodness-of-fit can be broken 
down to a fit measure per dimension, the so- 
called eigenvalue. Each dimension of the solu- 
tion has an eigenvalue, which corresponds to 
the mean variance of all the variables as quan- 
tified on that dimension. As such, the eigenval- 
ues reflect explained variance, per dimension, 
just like they do in techniques like principal 
components analysis, although here they reflect 
the variance of the quantified variables. The 
lower dimensions always explain the most vari- 
ance, and as one adds dimensions, the eigenval- 
ues of the added dimensions become lower and 
lower. As such, for choosing the dimensional- 
ity of the solution, a scree plot can be used, 
although in practice two dimensions are often 
chosen. 

Just as the importance of the respective 
dimensions can be assessed using the eigen- 
values, so the importance of the respective 
variables can be assessed using the so-called 
discrimination measures. The discrimination 
measures reflect to what extent the quanti- 
fied categories of a variable discriminate well 
between respondents. This translates into say- 
ing that the larger the spread of the quantified 
categories is, the larger the discrimination mea- 
sure. Conversely, a variable whose categories 
are very Close together in the solution has lit- 
tle contribution to the structuring of respon- 
dents in the solution. Discrimination measures 
for variables are computed per dimension: so 
a variable in a two-dimensional solution has 
a separate discrimination measure for each of 
the two dimensions. The maximum is 1, the 
minimum is 0. Thus, it may be that a variable 
discriminates well between respondents on one 
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dimension, as reflected in a high discrimination 
value, but does not do so on another dimension. 
As such, discrimination measures are helpful 
tools in interpreting the various dimensions 
of the solution: by inspecting what variables 
“load” heavily on a dimension, and what vari- 
ables load more on other dimensions, one can 
interpret dimensions as scales. 

Eigenvalues and discrimination measures are 
thus both indicators of goodness-of-fit: the 
eigenvalue gives the explained variance of a 
dimension, and the discrimination measures 
tell to what extent the respective variables are 
associated with the variability in object scores. 
Eigenvalues and discrimination measures are 
intrinsically linked, with the eigenvalue of a 
dimension computed as the average of the dis- 
crimination measures of all variables for that 
dimension. Also, the sum of the p-eigenvalues 
equals the goodness-of-fit of the total solution. 

Each respondent is assigned a point in 
p-dimensional space. These object scores can 
be represented in a p-dimensional space, form- 
ing a cloud of points. They have unit variance 
for each dimension. The category quantifica- 
tions are similarly placed in this space: each 
category receives a p-dimensional quantifica- 
tion, and these quantifications are as such also 
points in the same p-dimensional space. Just as 
the discrimination measures and the eigenval- 
ues are related, so is there a relation between 
the orthonormalized object scores and the cate- 
gory quantifications. More precisely, the quan- 
tification of a category is the average of the 
object scores of all respondents who scored 
that category. Conversely, the object score of a 
respondent is the average of all category quan- 
tifications of the categories he or she scored. 

This implies that respondents are generally 
placed in the p-dimensional space close to the 
categories they scored. Thus, we may char- 
acterize a respondent by the categories that 
are placed in his or her vicinity. This implies 
also that respondents who are placed in each 
other’s proximity in the solution, will have 


similar answering patterns. Thus, respondents 
with similar answering patterns form clusters 
of object scores around or in the neighborhood 
of the categories that characterize them. This 
also implies that if all respondents share one 
characteristic, this characteristic will be placed 
centrally; by necessity, for every dimension the 
quantification will then be zero. Such centrally 
placed quantifications also occur for categories 
that are not really characteristic for certain 
(subgroups of) respondents. Thus, the tech- 
nique produces a solution in which homoge- 
neous subgroups of respondents with particular 
response patterns are identifiable. Respondents 
not belonging to such a subgroup, or categories 
not shared by such distinct groups, end up in 
the middle of the solution. 


Example of multiple correspondence 

analysis 

For illustrative purposes a small dataset will 
be analyzed. The dataset contains information 
on nine colleagues of the author as well as 
the author herself. Data have been collected by 
observation on “Profession” (variable 1), “Gen- 
der” (variable 2), “Hair color” (variable 3), “Hair 
length” (variable 4) and “Body length” (vari- 
able 5). The data are in Table 21.1. 

We ran HOMALS in two dimensions for 
this dataset. The total fit of the solution is 
1.003236, which may be considered accept- 
able. The eigenvalue of the first dimension was 
.629; that of the second dimension was .374 
(thus, the total fit equalled 2 — (.629 + .374)). 
The discrimination measures of the quantified 
variables on the two dimensions are given in 
Table 21.2. Averaging the discrimination mea- 
sures on the first and second dimensions gives 
the eigenvalues. 

The eigenvalues show that firstly—as was to 
be expected—“Profession” is not able to dis- 
criminate respondents at all. As all respondents 
were criminologists there is simply no vari- 
ance and so also there can be no discrimina- 
tion on this variable. Next, the variables that 


338 Handbook of LongitudindP Rega: https:/afrilibrary.com 


Table 21.1 Data in example dataset 


Respondents Profession Gender Haircolor Hair length Body length 

Victor criminologist male blond medium tall 

Catrien criminologist female blond long shortish, but not diminutive 
Miriam criminologist female blond long average 

Gerben criminologist male gray short tall 

Kim criminologist female red long shortish, but not diminutive 
Samora criminologist female black long shortish, but not diminutive 
Jorgen criminologist male black short average 

Henk criminologist male black short shortish, but not diminutive 
Michael criminologist male gray medium tall 


Table 21.2 Discrimination measures and 
eigenvalues of example HOMALS solution 


Variable Dimension 1 Dimension 2 
Profession .000 .000 
Gender .836 .092 
Hair color .642 .756 
Hair length 889 765 
Body length .780 .258 
Eigenvalue .629 374 


discriminate strongest on the first dimension 
are “Gender”, “Hair length” and to a lesser 
extent “Hair color” and “Body length”. In fact, 
“Gender” discriminates barely on the second 
dimension. The first dimension thus discrimi- 
nates respondents on the basis of gender as well 
as on variables that have a strong association 
with gender such as “Body length” (men being 
taller), and “Hair length” (men generally having 
shorter hair). The second dimension, that has 
a much smaller eigenvalue, is based mainly on 
the discrimination of respondents with respect 
to hair color and hair length. This is probably 
pretty coincidental here, as out of three respon- 
dents with black hair, two had it cut short. Of 
all women with long hair, two had blond hair. 
The relations are much more fuzzy, however, 
as is reflected in the lower eigenvalue. 


Table 21.3 gives all category quantifications 
and the object scores. As can be reconstructed, 
the centroid of all object scores is zero, as are the 
centroids of the category quantifications of any 
variable. Figure 21.1 plots the category quan- 
tifications. 

Figure 21.2 plots the object scores, and 
should be viewed as superimposed on Figure 
21.1. Combining the information from the two 
graphs, some nice clustering becomes clear. On 
the right-hand side, we see all four females. Red 
hair is placed in this vicinity, exactly on the 
spot where the only respondent with red hair 
was placed (1.315, .635). Long hair is placed 
close by. Blond hair is also close to the cate- 
gory quantification for “female”, but as there is 
also a male with blond hair it tends somewhat 
to the left-hand side of the picture where most 
males were placed. The male criminologists are 
placed in this left-hand side, but much more 
spread out. The two shortish men with black 
hair are placed in the bottom left of the pic- 
ture. Two tallish men with medium-length hair 
have been placed in the top left-hand side of 
the figure. There is one man left (Gerben) who 
does not fit either homogeneous subgroup: this 
is a tall man (for which reason he should be 
placed with the upper left cluster) with, how- 
ever, short hair (for which he should be placed 
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Table 21.3 Object scores and category quantifications 


Object scores 


Body length 


Victor —.932 1.176 shortish 
Catrien .992 .694 average 
Miriam .829 .380 tall 
Gerben —1.259 —.222 

Kim 1.315 635 

Samora .952 —.355 black 
Jorgen —.301 —1.800 blond 
Henk —.138 —1.487 gray 
Michael —1.459 .980 red 


Gender 
.781 —.128 female 1.022 .338 
.264 —.710 male —.818 —.271 
—1.217 .645 
Hair length 
Hair color short —.566 —1.170 
171 —1.214 medium —1.196 1.078 
.296 .750 long 1.022 338 
—1.359 379 
1.315 .635 [“Criminologist” at (0,0)] 


medium 


blond hair red hair 
long hair 


female 


tall 
gray hair 


Jcriminologist 


shortish 


average 


short hair black hair 


Figure 21.1 Category quantifications of example 
solution 


with the lower left cluster). As an intermediate 
solution, he is placed in between. 

For a more in-depth discussion of the 
HOMALS program, its algorithm, and output, 
we refer to Gifi (1990), Greenacre (1993), and 
Van de Geer (1993). 


2.2 Extension to ordered and interval 
variables 


In the example above, we treated all variables 
as categorical variables, i.e., the categories have 
been treated as if they partition respondents 


Victor 
Michael 
Catrien 
Kim 


Miriam 


Samora 


0.5 1.0 1.5 


Figure 21.2 Object scores of example multiple 
correspondence solution 


into qualitatively different subsets. The vari- 
ables “Body length” and “Hair length”, how- 
ever, partition respondents into subsets that 
reflect a certain ordering as well: from shortish 
to average to tall, and from short to medium 
to long. Treating these categories as unordered 
implies that for these essentially categori- 
cally ordered variables we analyze nonlinear 
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relations with the other variables: we look sim- 
ply at associations at category level. One might 
however want to investigate relations between 
variables where this ordering is taken into 
account. In the context of multiple correspon- 
dence analysis this means that one would con- 
strain the category quantifications of the orig- 
inal categories to be on a line, on which they 
would need to be ordered just like their orig- 
inal ordering. This argument can be extended 
in the sense that if the categories would reflect 
not just monotone ordering, but would reflect 
a certain quantity (say, whether a respondent 
has 1, 2 or 3 children) one might want these 
categories to be quantified such that not only 
the original ordering is preserved, but also the 
interval properties of the categories. In that case 
one would constrain the category quantifica- 
tions to be not only on a straight line, preserving 
the original monotone ordering, but also request 
the intervals between the categories to be 
equidistant. 

In case such constraints are desired, we have 
a set of categorical variables of mixed mea- 
surement level: some are nominal, some are 
ordinal and some are numerical. The technique 
then to analyse these data, taking into account 
the mixed measurement level is called non- 
linear principal components analysis. Nonlin- 
ear principal components analysis can thus be 
seen as an extension to multiple correspon- 
dence analysis, namely where for some vari- 
ables category quantifications are found under 
special constraints. Nonlinear principal compo- 
nents analysis can also be viewed as an exten- 
sion of ordinary principal components analysis, 
with optimal scaling of categorical variables 
(Gifi, 1990, Chapter 4; Young, Takane and 
De Leeuw, 1978). 

The first kind of transformation is generally 
referred to as “nominal”, the second as “ordi- 
nal”, and the third as “interval”. In optimal 
scaling programs the ordinal scaling of category 
values generally allows also for weak mono- 
tone transformations. Also, nominal variables 


may be transformed such that they are on a 
straight line, be it that this may be in any order; 
this is often referred to as “single nominal”. 
In this case, the two-dimensional quantifica- 
tion space has been reduced to one dimen- 
sion. Recent extensions to the software have 
also made it possible to opt for more sophis- 
ticated transformations of variables such as 
spline transformations, and have added user- 
friendly amenities such as imputation of miss- 
ing values, variable weighting, etc. (see SPSS, 
2006). Nonlinear principal components anal- 
ysis is available in SPSS in the procedure 
CATEGORIES (SPSS, 2006) and in SAS in the 
procedure PRINQUAL (SAS, 2006b). 

Similarly to multiple correspondence anal- 
ysis, nonlinear principal components analy- 
sis aims at maximizing the similarity between 
the quantified variables. Again, maximum 
similarity is attained when all the quantified 
variables are—simultaneously—as similar as 
possible, or when all the quantified variables 
are as similar as possible to the object scores 
Z, which amounts to minimizing the same loss 
function as in multiple correspondence anal- 
ysis, under the condition, however, that the 
matrices B; containing the category quantifica- 
tions ¢; satisfy 

B, =c,a; 
for variables treated as single (i.e., the single 
nominal variables, the ordinal variables, and 
the interval variables), where the p x 1 vectors 
a; are the correlations between the transformed 
variables Ic; and the object scores Z. These cor- 
relations are in the context of optimal scaling 
referred to as the “component loadings”. They 
are comparable to the discrimination measures 
in multiple correspondence analysis. Any unre- 
stricted scores in B; are referred to as the “multi- 
ple category coordinates”; the restricted scores 
c,a;’ are referred to as the “single category coor- 
dinates”: these are the coordinates of the cate- 
gories restricted to be on a single straight line 
in the solution space. 
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The interpretation of nonlinear principal 
components solutions runs similar to that of 
multiple correspondence analysis solutions. 
Often two dimensions are chosen, and the 
interpretation is done by looking for clusters 
of respondents and categories. Again, periph- 
eral patterns of answers and respondents have 
more particular, characteristic answering pat- 
terns and respondents as well as categories 
placed in the center that do not belong to homo- 
geneous subgroups of respondents with partic- 
ular profiles of answering patterns. While in 
multiple correspondence analysis one always 
uses the category quantifications for interpreta- 
tions, in nonlinear principal components anal- 
ysis one does so only for those variables which 
function in the solution as multiple nominal. 
For single variables, the single category coor- 
dinates are used, i.e., the category quantifica- 
tions that are restricted to be on a line. Also, 
whenever single quantifications are present, 
solutions are not nested. 


2.3. Extension to more than one set 
of variables 


In the models discussed thus far, all variables 
have been treated similarly, i.e., each variable 
plays the same role in that we investigate the 
relation of this variable to all other variables 
simultaneously. However, in many instances it 
may be the case that we are not interested in the 
relations between all variables simultaneously, 
but rather in the relations between one group of 
variables on the one hand, and another group 
of variables on the other hand. For example, we 
might be interested in the relation between a 
number of socioeconomic characteristics, and 
a number of demographic characteristics, or, 
we may be interested in a number of patient 
personality charateristics, a number of thera- 
pist characteristics, and a number of psychody- 
namic therapy characteristics. In that case, not 
all variables play the same role. Variables are 
in such cases conceptually divided into sets, 


and we are not as much interested in the rela- 
tions between variables within these sets (e.g., 
between the personality characteristics); rather, 
we are interested in the relations between the 
variables between the sets. 

As in the examples above, we are thus look- 
ing for homogeneous subgroups of respondents; 
however, we do not want to investigate their 
similarity on the basis of the interrelations of 
all variables, but on the basis of the interrela- 
tions of the variables in the different subsets. 
A linear variety of this technique is referred 
to as generalized canonical analysis (GCA). 
When variables of mixed measurement level 
are included it is called nonlinear generalized 
canonical analysis (see Van der Burg, De Leeuw 
and Verdegaal, 1988; and Gifi, 1990, Chapter 5). 
Nonlinear GCA maximizes the correspondence 
between two or more sets of variables. For doing 
so, a weighted sum of the variables in the vari- 
ous sets is constructed and the correspondence 
between these weighted sums is optimized, just 
like in canonical correlation analysis or mul- 
tiple regression analysis. Nonlinear GCA has 
been implemented in the computer program 
OVERALS, available in the SPSS procedure 
CATEGORIES (SPSS, 2006). 

The loss function for nonlinear generalized 
canonical correlation analysis is an extended 
version of the one for multiple correspondence 
analysis. The difference is that there are now 
K sets of variables, each with M, variables. 
The indicator matrix of variable j of set k is 
written as I, and the matrix of category quan- 
tifications as B,. The loss function to be min- 
imized is an extended version of loss function 
(1). Again, the vector c, contains the numeri- 
cal, ordinal or nominal quantification, and ax 
is the vector of weights, now each time for the 
j-th variable in the k-th set. The badness-of- 
fit can be broken down by set and by dimen- 
sion. The eigenvalues again relate to the fit 
and reflect as before the explained variance 
of a dimension, although this explained vari- 
ance now refers back to the weighted sums 
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of variables, and not to the respective vari- 
ables. This implies that it is theoretically pos- 
sible to find a good congruence (high fit) 
between sets, but low explained variance of the 
variables. 

The vector aj, contains the weights per 
dimension that sum the variables into the 
weighted sum, also encountered in the liter- 
ature under the name canonical variate. As 
in multiple regression, these weights depend 
for each variable not only on the particular 
variable’s contribution, but also on the contri- 
butions of the other variables within the set. 
This implies that these weights are suscepti- 
ble to multicollinearity, so that these can not 
be interpreted in a straightforward manner. 
For variables considered as multiple nominal, 
the scores B, are the multiple quantifications 
of the categories. However, also for multiple 
nominal variables, interpretation of their rele- 
vance is less straightforward than in the one- 
set case. Analogously to the single variables 
situation, the p-dimensional quantifications of 
the categories of multiple nominal variables 
take into account the effect of the other vari- 
ables within the set as well: the effect of 
multicollinearity is incorporated in the mul- 
tiple category quantifications. Thus, whereas 
in multiple correspondence analysis the cate- 
gory centroids were identical to the category 
quantifications B;, in nonlinear multiset analy- 
sis category centroids and category quantifica- 
tions are different entities. For that reason, for 
interpreting the association of the categories of 
a multiple variable, it is advisable to inspect 
the category centroids of the object scores; for 
interpreting the association of the categories 
of a single variable, the projected category 
centroids must be inspected. Whenever single 
quantifications are present, nonlinear general- 
ized canonical analysis solutions are not nested, 
just like nonlinear principal components 
analysis. 

Again, as in the previous two techniques, the 
object scores again form an orthonormalized 


system, and category centroids, projected cen- 
troids, component loadings, and object scores 
are part of the same solution and also the 
same space. Respondents are again character- 
ized by the categories that have been placed 
in their vicinity, and respondents sharing the 
same profile form a homogeneous subgroup of 
respondents; respondents placed at a distance 
have dissimilar patterns and belong to different 
subgroups. Categories in the periphery of the 
solution characterize the more homogeneous 
subgroups of subjects. Categories in the centre 
of the solution are shared by many subjects, 
and generally can not be used to characterize 
distinct subgroups. 

When there are only two sets, nonlinear gen- 
eralized canonical analysis is the nonlinear ana- 
log of canonical correlation analysis (Hotelling, 
1936; Tatsuoka, 1988, Chapter 7). In case of two 
sets of variables and mixed measurement level 
variables, nonlinear GCA is—but for super- 
ficial differences—identical to the model for 
nonlinear canonical correlation analysis intro- 
duced by Young, De Leeuw and Takane (1976) 
and by Van der Burg and De Leeuw (1983). If 
there is one variable per set, nonlinear multiset 
analysis reduces to nonlinear principal compo- 
nents analysis. If there is one multiple nomi- 
nal variable per set, nonlinear multiset analysis 
reduces to multiple correspondence analysis. 
Carroll (1968) defined linear generalized canon- 
ical analysis or K-sets analysis; other possibili- 
ties for linear K-sets analysis were described by 
Kettenring (1971) and Van de Geer (1984). For 
more details on nonlinear multiset analysis, see 
Van der Burg, De Leeuw and Verdegaal (1988), 
and Gifi (1990). 


3 Analyzing longitudinal data 
using optimal scaling techniques: 
two strategies 


Optimal scaling techniques can be used flexibly 
for analyzing longitudinal data. The reasons for 
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this are that, firstly, as these techniques make 
no distributional assumptions, there is also no 
need to accommodate the statistical complica- 
tions that repeated measures induce. Secondly, 
optimal scaling techniques can fairly flexibly 
handle missing observations that are an incre- 
mental problem in most situations of repeated 
observations. Thirdly, they have nice features 
to model and explore growth. Last but not least, 
categorical variables can be easily included in 
the analysis. 

There are basically two strategies for analyz- 
ing mixed measurement longitudinal data in 
which optimal scaling techniques can be used. 
In the first, the data box of observations (the first 
dimension being the respondents, the second 
dimension being the variables, and the third 
being the time points) is flattened and ana- 
lyzed. In the second strategy, some kind of time- 
series regression is performed. This can either 
be done by regressing the time-dependent mea- 
surements on the time-axis or on some transfor- 
mation of it, or by accommodating time in some 
other manner, for instance by regressing the 
time-dependent measurements on their lagged 
versions. 

If we have measurements over N respondents 
and k variables, obtained over T time points, 
we are faced with an N x k x T data box. In 
the first strategy, the data box is first treated 
as if it were cut into T slices, or matrices, 
with one N x k matrix of observations per time 
point. Ordinary coss-sectional analysis works 
on this N xk matrix. The T slices/matrices 
of observations per measurement wave can be 
analyzed longitudinally using optimal scaling 
techniques as follows: a first option is to stack 
N x k matrices vertically. Researchers have also 
employed this method outside of the context 
of optimal scaling, and the resulting data file 
is also referred to as a “person-period file”, or 
as a LONG matrix. Each respondent appears 
as often in the flattened data box as he or she 
was observed repeatedly. This obviously cre- 
ates statistical problems if conventional statis- 


tical methods are used, because the number of 
independent observations is less than the NT 
rows in the data matrix. Thus, statistical tests 
will be inflated, as the variance of the estimators 
is underestimated, and the degrees of freedom 
employed too high. The first problem is not a 
problem when employing optimal scaling tech- 
niques as the techniques serve essentially for 
exploration. Provisions need to be made to visu- 
alize and interpret the development of respon- 
dents over time. 

In principle, it would also be possible not 
to flatten the data box by stacking the TN xk 
matrices horizontally, but vertically (this is 
referred to as the BROAD matrix). In that case, 
the Nx kx T data box becomes an N x kT 
data matrix. For many practical situations (as in 
the previous flattening option, this can also be 
done when using ordinary analysis techniques) 
the number of variables may with increasing 
numbers of time points quickly become too 
large for analysis. See Visser (1985, pp. 48-55), 
who formalized these options and discussed 
statistical implications. 

In the second strategy where some kind of 
time-series regression is performed, the time- 
dependent measurements are, after flattening 
the data box to a LONG matrix, regressed on 
the time-axis or on lagged versions of the mea- 
surements. This means that in this strategy 
we always employ several sets of variables: one 
that contains the variables of interest and one 
that contains the regressors. In the context of 
optimal scaling techniques we will thus almost 
always use multiset analysis. As an example, 
we could regress the measurements from time 
point 1 to time point 10 on a dummy time- 
variable that has values 1, 2,..., 10. This time- 
variable can in the context of optimal scaling 
be treated as an interval variable, in which 
case we perform some symmetrical version of 
ordinary time-series regression. We could, how- 
ever, also treat the time-variable as an ordinal 
or a spline variable, relaxing the assumptions 
and not requiring developments over time to be 
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linear but allowing other monotone transforma- 
tions of change. Of course, we could also allow 
the time-variable to be single nominal, in which 
case we freely explore, seeking for maximum 
correspondence between the variable set and 
the dummy time-variable set, how the variables 
develop over time, also allowing for curvilin- 
ear or even more irregular development of the 
variables over time. 

Within this second regression strategy we 
could also regress the variables not on the time- 
axis itself, but on some lagged version of the 
variables, in some kind of autoregressive setup. 
This option is in a sense a variety of the hor- 
izontal stacking option (the BROAD matrix) in 
the flattening strategy. Suppose that we have 
measurements on two time points. In that case 
we could simply regress the variables measured 
at t= 2 on the same variables measured at t = 1, 
investigating how the variables at tf = 1 are asso- 
ciated with the variables at t = 2. Or, if we have 
measured respondents at 4 time points, then we 
could stack the measurements for t= 2, t = 3 
and t = 4 into a first set, and those for t = 1, 
t= 2, and t = 3 in a second set, such that each 
respondent’s scores at time point T =f in one 
set are related to his or her scores at time point 
T =t-1. 


3.1 Examples of the two strategies 


We now analyze a fictional example. In the 
example, 10 respondents have been measured 
each at 4 time points. For each respondent for 
each time point we have collected data on the 
variables “Happiness” and “Sense of purpose”. 
These latter variables are measured on 4-point 
scales. Respondents are divided into three ther- 
apy groups: type A, B and C. See Table 21.4. 
We start by performing a one-set analysis, 
treating each variable as a categorical variable. 
The data box has been flattened with data matri- 
ces for each time point stacked vertically, so 
that each respondent appears 4 times in the 
solution: instead of 10 observations the analysis 
is thus run for 40 observations. We investigate 


Table 21.4 Data in longitudinal example dataset 


ii Tete fein Tat! Oe Jet ie Tn 


1 1 1 1 A 6 1 2 2 B 
1 2 2 2 A 6 2 2 1 B 
i <3 3. 2 A 6 3 3 2 B 
1 4 4 3 A 6 4 3 3 B 
2 1 1 2 A 7 1 1 1 B 
2 2 2 2 A 7 2 3 2 B 
2 3 3 3 A 7 3 3 3 B 
2 4 4 4 A 7 4 4 4 B 
3.61 2 2 B 8 1 1 1 B 
3. 2 2 3 B 8 2 2 1 B 
3.63 3 3 B 8 3 2 2 B 
3.4 3 4 B 8 4 4 2 B 
4 1 2 1 B 9 1 4 2 C 
4 2 2 1 B 9 2 3 1 C 
4 3 3 1 B 9 3 2 1 C 
4 4 3 2 B 9 4 1 1 C 
5.61 1 1 A 10 1 4 4 C 
5 2 2 2 A 10 2 3 3 C 
5. 63 3. 2 A 10 3 2 1 C 
5 4 4 4 A 10 4 #1 1 C 


1R = respondent number, t = time point, Ha = Happiness, 
Pu = Purpose, Th = type of therapy 


the association between the variables “Happi- 
ness”, “Purpose”, and “Therapy”. Doing so, the 
analysis converges in 26 iterations to an accept- 
able fit measure of 1.117, with eigenvalues .585 
for the first dimension, and .532 for the sec- 
ond dimension. The discrimination measures 
for the variables per dimension are given in 
Table 21.5. 

As can be seen from Table 21.5, “Therapy” 
does not discriminate respondents over time 
points very well: on the first dimension there 
is hardly any discrimination, on the second 
dimension it is a little more. This is not surpris- 
ing as the data show that there are pretty consis- 
tent and similar developments over time over 
the two variables “Happiness” and “Purpose”, 
and the variable “Therapy” is almost orthogonal 
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Table 21.5 Discrimination measures for the 
analysis of longitudinal according to flattening 
strategy 


Variable Dimension 1 Dimension 2 
Happiness .786 .681 
Purpose .867 .587 
Therapy .103 .327 


to these developments. Given that “Happiness” 
and “Purpose” have such a strong association, 
it is logical that these two variables dominate 
the solution over the single “Therapy” variable. 

Figure 21.3 gives the structure of category 
quantifications. It can be seen that high scores 
on “Happiness” and “Purpose” are located in 


the right-hand upper part of the figure, and 
low scores are positioned at the opposite end 
of the structure. Middle values on these vari- 
ables are located in the middle bottom part of 
the figure. This is also where therapy type B is 
located. Type C is on average located close to 
low scores of happiness and a sense of purpose. 
Type A is located somewhat right to the middle, 
showing that respondents following this ther- 
apy are characterized by slightly higher scores 
on “Happiness” and “Purpose”. Respondents 
following therapy type B tend to be charac- 
terized more by somewhat average scores on 
“Purpose” and “Happiness”, and respondents 
following type C tend to have lower scores 
on “Purpose” and “Happiness”. The so-called 
horseshoe structure of object scores and cat- 
egory quantifications is typical for multiple 


Figure 21.3 Category quantifications and trajectories of therapy groups 
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correspondence analysis and its associated 
optimal scaling techniques, and is an artefact of 
the least squares criterion used. 

In Figure 21.3, we have also connected the 
average object scores of the respondents per 
therapy group per time point. We could also 
have connected for each respondent the scores 
at the various time points, but this gives a 
not very insightful picture. What the lines 
now show is how respondents in therapy A 
in general develop from fairly low scores on 
“Happiness” and “Purpose” to high scores. This 
happens with a swinging movement, following 
the horseshoe. Those in therapy C on the other 
hand, develop in the other direction; they seem 
to develop from better to worse. For those in 
therapy B, developments are positive, though 
not as marked as for those in therapy A. 

Now, we could also perform a regression- 
type analysis on this data. We will not give 
the results of such an analysis, as we will be 
carrying out a regression-type analysis in the 
empirical example below, but simply outline 
a number of options for doing this. Firstly, 
we could simply stack the time-dependent 
measurements on “Happiness” and “Purpose” 
in the first set, and construct a dummy-time 
variable in the second set, which we could 
treat as either an interval, ordinal, or cat- 
egorical variable. We could also investigate 
simultaneously to what extent developments 
on “Happiness” and “Purpose” are explained 
by respondents and time points, by adding a 
second variable to the second set that sim- 
ply codes the respondents. In this manner we 
investigate the association between the vari- 
ables measuring well-being in the first set, and 
the variables measuring static interindividual 
differences (the variable indicating the respon- 
dent) as well as dynamic intraindividual dif- 
ferences (the time variable) in the first set. See 
Table 21.6, that gives the variables in the sets, 
with their measurement levels. 

Running such time-regression analyses, one 
would expect a solution that gives an interpreta- 


Table 21.6 Design of multiset analysis of 
therapy outcomes and explanatory static 
and time-dependent variables 


Set 1 Set 2 
Respondent (multiple Purpose (ordinal) 
nominal) 
Time (multiple Happiness (ordinal) 
nominal) 


tion not wildly different from the previous anal- 
ysis. However, there might be some differences, 
as we now attempt to actually discriminate 
the time points as well as the respondents on 
the basis of the variables measuring well-being, 
while in the previous flattening analysis we 
simply looked at the association between “Hap- 
piness”, “Purpose” and “Therapy”, not taking 
into account the time dimension. 


3.2 The two strategies revisited 


In this flattening option in which each respon- 
dent appears as often in the data matrix as 
he or she was observed, it is assumed that 
changes as they occur materialize in change 
in the respondents. In other words, respon- 
dents change in a stable world. While such 
an analysis is generally easily carried out and 
easy to interpret, there are some disadvan- 
tages as well. The first of these is spurious 
effects. If the variables do not interrelate but 
exhibit the same kind of growth over time, the 
vertically stacked super-matrix will generate a 
positive association between the variables. The 
correlation is spurious because it would dis- 
appear when controlling for time. In general, 
it is thus wise to check solutions of such a 
vertically stacked matrix also at a number of 
cross-sections to see whether the associations 
found over time also exist cross-sectionally 
and are not merely an artefact of the stack- 
ing. Another problem may be that in flattening 
and vertically stacking the data matrices, the 
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ordering in time of the respondents is lost: if 
we would permutate the rows of the NT x k 
data matrix, we would achieve exactly the same 
solution. Adachi (2000) provided a solution 
for this by constraining the time-ordered object 
scores such that subsequent observations do not 
receive very different scores, and subsequently 
(Adachi, 2002) by less restrictively smoothing 
the object scores over the time dimension and 
tying them in the time ordering. 

In the second flattening option (the BROAD 
matrix) in which each variable appears as often 
in the data matrix as it was measured, it is 
assumed that changes as they occur materialize 
in change in the variables. The analysis tech- 
nique views every variable at each time point as 
a new, different variable. Respondents in other 
words stay put in a world that evolves around 
them. This has substantive but also practical 
implications. If we have, for instance, a vari- 
able that indicates the type of drug that respon- 
dents use, measured at several time points, 
the optimal scaling technique regards this vari- 
able not as one, but as three variables. Each of 
these variables is freely quantified, and there 
is thus no guarantee that the quantifications of 
the same variable at the various time points 
will be identical. And if the quantifications 
are indeed different, the rescaled variable is 
not comparable anymore across time points, 
which creates fairly grave conceptual difficul- 
ties, and may force the researcher to aban- 
don the analysis. The SERIALS program (Van 
Buuren, 1990; 1997) ensures that the quantifi- 
cations of the various temporal versions of the 
same variable are identical; as this, however, 
is not implemented in standard software, for 
practical purposes, the second option must be 
considered less attractive. In spite of these lim- 
itations, it may be a useful option neverthe- 
less, or the only feasible option, when different 
variables have been measured on subsequent 
occasions. 

Analyzing the LONG or the BROAD matrix 
has consequences for the handling of missing 


data. Missing data on a not too large number of 
variables can be handled fairly easily by opti- 
mal scaling techniques. When the T N x k data 
matrices are stacked horizontally, respondents 
lost to follow up can simply be omitted. Sup- 
pose that 100 respondents have been observed 
at time 1, 80 at time 2, and 60 at time 3, the data 
matrix to be analyzed that results after stacking 
is simply 240 (= 100 + 80 + 60) xk. All mea- 
surement occasions for which data are available 
can be entered, and those occasions for which 
no observations were collected are simply left 
out. This means that series of unequal length 
can be included, and even interrupted series 
may be entered. The proportion of missing val- 
ues should not become too high, however. For 
instance, in HOMALS the discrimination mea- 
sures may become larger than one when there 
are many missing values. Unfortunately, no 
absolute guidelines on this proportion of miss- 
ing values can be given. 

In the flattening strategy, time as such is thus 
not explicitly the object of analysis. By stack- 
ing respondents horizontally, we are able to 
explore the “travel” over time of respondents 
through the p-dimensional space of the solu- 
tion. In the time-series regression-type strategy, 
on the other hand, time is much more explic- 
itly modeled, in the sense that the repeatedly 
measured variables are regressed on variables 
that are meant to capture the dynamic and time- 
dependent nature of the measurements. Thus, 
the time-dependent horizontally stacked vari- 
ables in one set are related to some version of 
the time-axis in the other set, and, as we are 
seeking to maximize the homogeneity between 
the sets, we are actually attempting not only to 
summarize respondents’ average scores at each 
time point, but we are also actually attempt- 
ing to discriminate the time points in terms 
of the variables, in order to obtain a profile 
of each time point in terms of the variables of 
interest. As such we can answer questions like: 
What were the general conditions of respon- 
dents when they embarked upon therapy? How 
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was their condition midway? and How can their 
general condition upon completion of therapy 
be described? 

Extending creativity further, of course one 
could employ not only one time variable and 
perhaps a variable reflecting interindividual 
differences, investigating interindividual and 
intraindividual differences on the variables of 
interest simultaneously, but we could actu- 
ally include several dummy time-variables. We 
could request, for instance, the first variable 
(t=1, 2, 3, ..., 11) to be scaled interval, and 
incorporate a second variable (t = 1, 2, 3, 4, 
5, 6, 5, 4, 3, 2, 1) to be also scaled inter- 
val, and as such investigate what growth curve 
fits the development over time of the vari- 
ables of interest better. Also, as we usually 
investigate in two-dimensional space, we could 
explore whether perhaps on some variables the 
development is linear or monotone increasing, 
while on others, as reflected in another dimen- 
sion, growth is rather curvilinear, or has ups 
and downs. In this way, we approach hierar- 
chical modeling or growth-curve analysis. See 
also Michailides and De Leeuw (1997; 2000), 
who presented a version of multiple correspon- 
dence analysis that provides a solution space 
for all respondents but shrinks or expands the 
parameters differently for different clusters of 
respondents. 

What we do is not fundamentally different 
from ordinary time-series regression, apart from 
the fact that multiset analysis is essentially a 
symmetrical technique as it looks for associa- 
tion between the weighted sums of the variables 
of the sets, and does not regress the weighted 
sum (or in case of only one variable in a set, 
a variable) on a set of variables. It is possi- 
ble through some computations to rewrite the 
results into asymmetrical form (see Bijleveld, 
1989, Chapter 4). When there is only one vari- 
able in the time-set, it is possible to use the pro- 
cedure CATREG from SPSS categories, which 
is a nonlinear analog of ordinary, asymmetri- 
cal regression (SPSS, 2006). The ease, again, is 


that no distributional assumptions are made, as 
no tests are performed, so the time-dependence 
can not complicate it either. 

Note, that when relating lagged versions of 
variables, respondents lost to follow up at 
some later time generate particular problems, 
as respondents lost to follow up also affect 
other measurement waves. Also, when relat- 
ing lagged versions of variables, one essentially 
looks for what is similar between the measure- 
ments at the successive time points; in many 
situations, the research questions focus more 
on what changes over time. Of course, it is also 
possible to regress the variables at t = 2 not 
on the same variables at t = 1, but on differ- 
ent variables at t = 1. We might, for instance, 
measure alcohol intake at day t, and headache 
at day t + 1, and see to what extent alcohol 
intake is associated with headache the next day. 
Also this analysis could be made fancier by 
including individual intercepts or separate sets 
of variables containing categories such as gen- 
der, or group membership such as type of ther- 
apy. It is also possible to perform such an anal- 
ysis for one respondent who has been measured 
on many occasions. Higher-order lags may be 
included as well. 

The analysis of lagged versions of variables, 
though conceptually attractive, has a number 
of limitations. Firstly, one loses one time point 
for every lag: the higher the order of the lags 
the more time points have to be deleted. There- 
fore, higher-order lags can become unattractive 
or even impossible to model when there are few 
time points. A second technical limitation is 
that, in rescaling the lagged versions of categor- 
ical variables, there is, just like when analyzing 
the BROAD matrix, no guarantee that the cate- 
gories of these lagged versions of the variables 
will receive identical quantifications. 


4 Example 


We analyze data on self-reported misde- 
meanours and offending from the Netherlands 
Schoolproject (Weerman, 2007). During three 
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waves of data collection, several hundreds 
of students had been interviewed at a num- 
ber of secondary schools in the urbanized 
center of the Netherlands. The schools all 
offered lower-level secondary education, being 
mainly VMBO (Voorbereidend Middelbaar 
Beroeps Onderwijs — Preparatory Mid-level 
Vocational Training) schools, with a number 
offering tailored vocational training such as 
an agrarian school and a technical secondary 
school. In the first wave, students were in either 
first or third year. In the second wave, students 
were either second or fourth year (the fourth 
year being the year in which students in these 
types of schools do their final exams). In the 
third wave all students were in their third year. 
The study design is as such an overlapping 
cohort design. 

At each wave, students were given the same 
questionnaire that they filled out classwise 
on the computer, with at least one researcher 
present to answer questions and help out when- 
ever necessary. The questionnaire asked about 
a multitude of issues, such as relations with 
peers, relations with parents, delinquent peers, 
bonding with school, self-control, and the like. 
Key dependent variables were delinquency and 
misdemeanor during the school year. Various 
demographic characteristics were also regis- 
tered. The survey is as such a self-report survey. 
The prominent questions in the Schoolproject 
were: (1) how does delinquency and misbehav- 
ior develop over the course of the school career? 
and (2) to what static and dynamic predictors 
are delinquency and misbehavior related? 

Not all students were present at all waves. 
Earlier analyses showed that during the course 
of the study, there was selective attrition 
(Weerman, 2006). In our analysis, we were able 
to include every student at every measurement 
that was present for him or her. So, if for 
instance a pupil was present only at waves 1 
and 3, these measurements were included in the 
dataset. If a pupil was present only at the third 
wave, then only that wave was included. As 


was explained below, this does not hamper our 
analysis. Repeat measurements were included 
as “new cases”: the structure of the dataset is in 
Table 21.7. 

Every respondent thus plays a role in the 
analysis for those waves for which valid mea- 
surements were obtained. In total we had thus 
4769 rows in the dataset, for a total of 2661 
unique students. While missing waves did not 
hamper the analysis, also missing values did 
not hamper the analysis as they were simply 
left missing and did not play a role in the analy- 
sis. A number of very sparsely filled categories 
were recoded to the adjacent category to pre- 
vent outliers. 

We chose to perform a multiset analysis 
(OVERALS). Given that the key dependent 
variables were misbehavior and delinquency, 
we put these in one set. The delinquency 
and misbehavior variables, being both variation 
measures capturing the variation in behavior 
(with a score of zero implying no delinquent 
behavior and a high score implying a range of 
delinquent behavior, which correlates strongly 


Table 21.7 Structure example dataset 


resp 1 wave 1 variables 
resp 1 wave 2 variables 
resp 1 wave 3 variables 
resp 2 wave 1 variables 
resp 2 wave 3 variables 
resp 3 wave 1 variables 
resp 3 wave 2 variables 
resp 3 wave 3 variables 
resp 4 wave 2 variables 
resp 4 wave 3 variables 
resp 5 wave 1 variables 
resp 5 wave 2 variables 
resp 5 wave 3 variables 
resp 6 wave 3 variables 
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with the seriousness of delinquency) were both 
pretty skewed, but as explained above, this 
doesn’t affect our analysis. Both variables were 
treated as ordinal variables. Next, we put all 
variables capturing temporal development in 
a set; these variables were the waves, captur- 
ing a period effect (entered twice with a dif- 
ferent coding to enable both ordinal as well as 
nonlinear development). The third variable in 
this set was the respondent’s age, which var- 
ied from 11 to 19, with 99% of ages between 
12 and 17. The fourth variable was the respon- 
dent’s grade (which varied from first to fourth 
year). Both age and grade were treated as mul- 
tiple nominal variables in order to be able to 
model nonlinear developments. In a third set 
we put all static characteristics, these being the 
school, as well as respondents’ sex and eth- 
nicity (coded according to prevailing Statistics 
Netherlands definitions). A little over 50% were 
boys, around two-thirds of measurements were 
obtained for Dutch students, other sizeable eth- 
nic groups were Surinamese (a former colony), 
Turkish and Moroccan (mainly children of for- 
mer migrant workers) and other non-western 
migrants. Antilleans (the Dutch Antilles are 
still part of the kingdom of the Netherlands) 
and western migrants were less prominently 
present. All three variables in this set are nomi- 
nal variables and were treated as multiple nom- 
inal. A last, fourth set was formed by a number 
of dynamic predictors of delinquency and mis- 
behavior. At each wave, students had indicated 
whether they had a good relationship with 


their parents and how strong their bond with 
school was. These variables figure prominently 
in theories of juvenile delinquency. Respon- 
dents had also indicated whether their friends 
are delinquent; this variable is a prominent 
and strong predictor of juvenile delinquency. 
Lastly, they had rated their own self-control. 
Most students reported good ties with their 
parents and average to good attachment to 
school. Self-control was approximately nor- 
mally distributed. Seriously delinquent friends 
were rare. All variables in this set were treated 
as ordinal. See Table 21.8 for an overview of 
the sets and their variables. 

We ran the analysis in two dimensions and 
set all analysis parameters otherwise to default. 
The algorithm converged in to a fit of .86, 
which may be viewed as acceptable. The object 
scores were well spread and did not show any 
indication of outliers. The first dimension had 
an eigenvalue of .470, the second dimension’s 
eigenvalue was .393. Table 21.9 gives a sum- 
mary of the loss per set per dimension. 

As Table 21.9 shows, the first dimension 
represents mainly the developments in delin- 
quency and misbehavior, as well as develop- 
ments in the dynamic predictors over time. The 
time variables play much less of a role here, 
as do the—and this is to be expected—static 
variables. This is not to say that the second 
dimension is an entirely static one: here the 
dynamic variables also play a role, although it is 
less prominent. All in all, it appears—given the 
loss per set—that delinquency over time is more 


Table 21.8 Overview of sets employed in analysis, variables and measurement levels 


Set 1 Set 2 Set 3 Set 4 
delinquency (ordi)! time_1 (ordi) school (mnom) relation parents (ordi) 
misbehavior (ordi) time_2 (mnom) gender (mnom) relation school (ordi) 

grade (mnom) ethnicity (mnom) self-control (ordi) 
age (mnom) delinquent peers (ordi) 


1 mnom = multiple nominal, ordi = ordinal 
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Table 21.9 Loss per set per dimension 


Dimension 1 Dimension 2 Total loss 
set 1 (delinquency and misbehavior) .290 .602 .892 
set 2 (time variables) .884 .606 1.490 
set 3 (static background variables) .664 .576 1.240 
set 4 (dynamic predictors) 281 645 .926 
eigenvalue 470 393 
fit 863 


strongly related to dynamic predictors that may 
vary over respondents over time, than to tempo- 
ral variables that are changing identically over 
time for each respondent, such as age or grade. 
Also, quite a bit of interindividual variation is 
apparently captured by individual differences 
as measured by the static predictors. 


Figure 21.4 gives a representation of the solu- 
tion. For the multiple nominal variables, the 
category centroids are depicted. For the ordi- 
nal variables the projected centroids are used; 
the categories of these variables are connected, 
and they always form a line through the origin, 
with an arrow pointing to the high scores. For a 


oO ls) 
Delinquency friends 


Bond parents (0) 


Self-control 


Figure 21.4 Category quantifications and trajectories of boys and girls from age 12 to 18 
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better overview, not all category centroids have 
been depicted in the graph. 

Interpretation of the solution is done along 
the lines sketched above; briefly it amounts to 
investigating the proximity of categories in the 
solution, and projecting centroids on to those 
variables whose connected categories form an 
arrow. Cross-sectional analyses on the datasets 
revealed approximately the same interpreta- 
tion, so that we need not be afraid that the rela- 
tions are spurious. This leads to the following 
conclusions. 

Students who report delinquent behavior at 
some time point are located in the left-hand side 
of the figure, somewhat under the x-axis. Stu- 
dents who report misbehavior at some point are 
located in the upper left side of the figure. It is 
immediately obvious that delinquency at a cer- 
tain time point is strongly associated with hav- 
ing delinquent friends. Likewise, misbehaviour 
is strongly associated with low self-control. The 
connected categories of the variable measuring 
students’ relationships with their parents at the 
various waves form an arrow that starts virtu- 
ally in the center—indicating how almost all 
students report a good relationship with their 
parents. It ends in the bottom left of the figure. 
Students’ ties with school at the various waves 
are also represented as an arrow, that points 
from a good bond in the middle upper right- 
hand side of the picture to a not so good bond 
in the middle lower left-hand side. 

Boys (the positions are not indicated in the 
graph) are positioned on average more in the 
upper left-hand side of the picture, close to less 
self-control, and more misbehavior, and also— 
though less strongly so—more delinquency and 
more delinquent friends. Girls are placed in 
the opposite end of the picture, close to less 
misbehavior and more self-control. The schools 
are well spread (also not indicated and ana- 
lyzed here). Antilleans (not indicated, but their 
average object score is in the left lower side 
of the picture) stand out for reporting rela- 
tively bad relations with their parents, with 


school, and for relatively high scores on delin- 
quency. Surinamese report about the same 
level of delinquency but more misbehavior 
than Dutch. Moroccan and Turkish respondents 
have a delinquency and misbehavior profile 
that resembles that of girls most. It is often 
suspected that especially Moroccan youngsters 
underreport in self-report delinquency surveys. 
Given that they are overrepresented in police 
statistics, we should reckon with the possibil- 
ity that this is the case here too. Remarkably, 
hardly any period effects are found; both the 
ordinal and the multiple nominal version of the 
cohort variable did not fit notably. 

Next, we investigated the development of 
problem behavior over time by drawing two 
trajectories. We connected the average object 
scores of boys and girls at ages 12, 13, 14, etc. up 
to age 18 (at age 19 we had only a few respon- 
dents left). This shows how the trajectories for 
boys and girls are located respectively in the 
upper left-hand side and lower right-hand side 
of the picture. Boys start out at age 12 at an 
average misbehavior level, a low level of delin- 
quency, and with very good relations with both 
their parents, as well as school. Self-control 
assumes an average score for boys at this age, 
and few report having delinquent peers. Girls 
start out at low levels of misbehavior, high self- 
control, ever lower levels of delinquency, and 
report hardly any delinquent peers; they have 
good relations with their parents and good ties 
with school. Thus, at age 12, girls are not delin- 
quent, not misbehaving, while boys are misbe- 
having a little, but not yet delinquent. 

It is remarkable how for boys as well as girls 
the biggest shift on the first dimension, that 
captures delinquency and misbehavior, occurs 
from age 12 to 13. Both swing leftward. For 
both, this means an increase in misbehavior, 
and hardly any increase in delinquency. Mov- 
ing on to age 14, we see how the trajectory 
for boys and girls starts to move to the bot- 
tom half of the picture. For boys this implies 
that they move to a higher delinquency level 
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and start reporting having delinquent peers. 
From age 13 onwards, self-control and misbe- 
havior remain fairly constant. For girls, we see 
basically the same change, though there is a 
level difference. Both boys and girls now start, 
from age 14 onwards, reporting less positive 
feelings towards school. By age 15, delinquency 
levels are definitely—for boys as well as girls— 
higher than what they were at age 14; misbehav- 
ior levels are more or less the same, and they 
may even be slightly less by age 15 for boys. 
Girls’ ties with school are now less strong than 
those of boys, and they also report much poorer 
relations with their parents than boys do at this 
age. Self-control remains fairly stable. 

This trend continues in similar fashion up to 
age 16, but by age 17 a remarkable phenomenon 
takes place as delinquency levels, for boys as 
well as girls, though more marked for boys, 
decline. Maybe this has to with the fact that 
most students at this age are in their final— 
exam—year. Those students still in the sample 
at age 18—probably a select group—are marked 
by increased delinquency levels if they are 
boys, and by strongly decreased misbehaviour 
and decreased delinquency if they are girls. 

All in all, we see how, over age, misbehavior 
jumps from ages 12 to 13 and continues to grow 
only a little and declines towards the end of 
the school career for boys, a little earlier than 
for girls. Delinquency increases up to age 16, 
although the growth levels off after age 15. Rela- 
tions with school decrease in similar fashion 
for boys as well as girls; relations with parents 
decrease much more for girls. 

Changes over the various grades could be 
drawn in the same manner. This would show 
how for girls the trajectory spreads from the 
upper part to the lower part of the figure, show- 
ing hardly any spread over the first dimen- 
sion. This shows that over their school career 
girls are much more stable in their criminal 
behavior and misbehavior than boys are. Over 
the school grades, girls in fact develop in the 
sense of a slight increase in delinquency and 


a decrease in misbehavior. Boys were shown 
when the lines were drawn to make a swing in 
misbehavior, increasing from year 1 to 2, and 
subsequently—slowly—decreasing. For delin- 
quency the decrease occurs only after year 3. 

Not all respondents had filled out the ques- 
tionnaire at every time point. To investigate 
whether those respondents who were missing 
at one or more waves had a different response 
pattern than those who didn’t, we computed 
the average object score of students with at least 
one missing wave. It turns out that this aver- 
age object score is (—.156, —.254), showing that 
these students are on average more disgrun- 
tled with school, report worse relations with 
their parents, and score relatively high on delin- 
quency and somewhat lower on misbehavior. 
Those students who were present only on the 
first wave have an even more marked average 
location at (-.503, .117), with higher misbehav- 
ior and higher delinquency scores. Sample attri- 
tion is different for boys and girls: boys who 
are missing at least one wave have an aver- 
age object score of (—.442, .071), i.e., high on 
delinquency and somewhat on the low side on 
misbehavior, and reporting below average rela- 
tions with school, but still fairly average rela- 
tions with their parents. On the other hand, 
girls who are missing at least one wave have 
an average object score of (.233, —.696), that 
implies that these girls score low on misbehav- 
ior, report bad relations with school, and partic- 
ularly so with parents, and score below average 
on delinquency. Thus, while for boys attrition 
appears to be associated most with misbehav- 
ior and delinquency, for girls bad relations with 
their parents are characteristic. 


5 Extensions 


As we showed above, the models discussed 
here can easily be used to analyze a dataset 
that has been constructed such that autoregres- 
sive types of models can be analyzed. Bijleveld 
and De Leeuw (1991) presented a model that 
analyzes the state space model, and allows 
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us to quantify any noninterval variables. The 
state space model specifies relations between 
a set of input variables and a set of time- 
dependent output variables, that are channeled 
through a latent variable, called the state vari- 
able, that captures both the time dependence 
through an autoregressive model, as well as 
channels the dependence between input and 
output variables. The model as such is explic- 
itly asymmetric. The state variable can be one 
or higher dimensional. While easily applicable, 
the model has not become used broadly mainly 
because of lack of user-friendly software. The 
analysis program DYNAMALS was designed 
for the analysis of datasets collected on one 
respondent, that contain many replications over 
time. DYNAMALS has been extended to also 
be able to analyze data gathered for more than 
one respondent. (Bijleveld and Bijleveld, 1997). 
Quantifications are stable over time and over 
respondents. The latent state values vary across 
respondents and over time. For each respon- 
dent, a latent state trajectory can be drawn. 
In general, when higher dimensional solutions 
are sought, DYNAMALS tends to emphasize 
interindividual differences in the first dimen- 
sion(s), and emphasizes intraindividual differ- 
ences in the following dimensions. 

Linear dynamic systems analysis for several 
subjects with optimal scaling has a number of 
advantages over ordinary time-series modeling. 
The most prominent of these is the fact that cat- 
egorical data can be analyzed. A second type 
of advantage is in the area of stability. Where 
cross-sectional methods derive stability from 
a suitably large number of replications over 
subjects, and where time-series models derive 
stability from a suitably large number of replica- 
tions over time, N>1 DYNAMALS derives sta- 
bility from time points as well as respondents. 

Van Buuren (1990) developed the SERIALS 
program that combines a state space or lin- 
ear dynamic system-type of model with the 
Box-Tiao transform proposed by Box and Tiao 
(1977), a method that extracts components from 


multiple time series, in such a way that these 
components are related as strongly as possible 
to lagged versions of themselves. Thus, the cor- 
relations between the lag(0) and lag(d) versions 
of a dataset must be as high as possible. This 
implies that the technique seeks those compo- 
nents that can be constructed from the data that 
forecast themselves as well as possible. Because 
of the forecasting constraint, the extracted com- 
ponents are mostly pretty smooth, and the tech- 
nique can thus also be viewed as a smoother of 
wild series. For details, see Box and Tiao (1977) 
and Van Buuren (1990). The optimal scaling 
happens under special constraints that ensure 
that the quantifications of the lagged versions 
of the variables are identical, so that they can 
be interpreted as the same variable. 


6 Software 


Multiple correspondence analysis, nonlinear 
principal components analysis, and nonlin- 
ear multiset analysis are all, probably most 
easily, available in SPSS in the _ proce- 
dure CATEGORIES. SAS has versions of 
the first two techniques. Versions in R, 
often with fancy extensions, are down- 
loadable from Jan de Leeuw’s’ website 
http://www.cuddyvalley.org/psychoR, which 
incidentally is a very good website to browse 
for novel extensions to these techniques. 


7 Concluding remarks 


In this chapter we discussed a number of 
exploratory techniques that use optimal scal- 
ing to quantify categorical data. The techniques 
can handle mixed-measurement-level categori- 
cal longitudinal data, whose values are rescaled, 
after which the categorical variables are treated 
as continuous variables. We discussed two 
approaches to the longitudinal analysis of cat- 
egorical data using optimal scaling techniques. 
In the first, we adapt the data box in such a 
way that the data can be fed into the tech- 
nique and analyzed as if they are cross-sectional. 
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Any categorical variables are transformed. When 
analyzing the LONG matrix or person-period 
file, these transformations are stable over time, 
and it is therefore assumed that the variables 
themselves do not change over time. When 
the technique produces a solution we return 
to the longitudinal properties of the data box, 
retrieving time points and respondents, and 
making visible any changes by drawing respon- 
dents’ trajectories from one time point to the 
next. Even though this approach is not with- 
out methodological hazards—the most notable 
being the risk of relationships induced by het- 
erogeneity over time points—it is flexible, easy 
to apply and conceptually attractive. The mul- 
tiset techniques can be adapted to investigate 
summary as well as group-specific changes 
in time. They can be used to explore non- 
linear growth for interval-level data, and can 
incorporate specific additional research ques- 
tions, such as questions about the discrimination 
between respondents or groups of respondents. 
Autoregressive setups can be made. For analyz- 
ing mixed-measurement-level longitudinal data, 
these techniques thus make it possible to investi- 
gate flexibly, and untroubled by statistical com- 
plications because of serial dependence, various 
kinds of substantive issues regarding growth. 

Another advantage of the use of optimal scal- 
ing techniques for the analysis of longitudinal 
data is that they can handle a modest propor- 
tion of missing data relatively easily. The tech- 
niques HOMALS, PRINCALS, and OVERALS 
simply exclude any missing observations from 
the loss function. The less user-accessible pro- 
grams for the analysis of long series of data use 
similar methods. Compared with confirmatory 
techniques, the two approaches share the disad- 
vantage that no stability of information is pro- 
vided. For those cases where we have sufficient 
replications over subjects, this can of course 
be overcome by jackknifing or bootstrapping, 
although this is cumbersome. 

All in all, the optimal scaling techniques pre- 
sented here are essentially exploratory tech- 


niques useful for generating rather than testing 
hypotheses. They are relatively untroubled by 
technical problems due to dropout and attrition 
common in longitudinal research. They provide 
a flexible and conceptually attractive frame- 
work for investigating a variety of exploratory 
research questions of a longitudinal nature, 
incorporating categorical variables of mixed 
measurement level. 
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Chapter 22 i 


An introduction to latent class analysis 
C. Mitchell Dayton 


Latent class analysis (LCA) is a method for 
analyzing categorical data from sources such as 
achievement test items, rating scales, attitude 
items, etc. It is assumed that the population 
from which respondents arose is divided into 
subgroups within which responses to the vari- 
ables are independent. These subgroups are not 
directly observed so the focus of LCA is on 
characterizing the latent structure of observed 
categorical data. 


1 Introduction 


Latent class analysis is arelatively new approach 
to analyzing multivariate categorical data. 
Often, for multivariate data the focus of anal- 
ysis is to explain the interrelations among 
the variables. Factor analysis, for example, 
was developed over one hundred years ago 
(Spearman, 1904) to “explain” the positive 
intercorrelations among achievement measures. 
Whereas for continuous variables interrelations 
are usually quantified by means of correlation 
coefficients, for categorical variables interre- 
lations involve conditional probabilities that 
cannot easily be summarized in simple numer- 
ical indices. LCA, in its various applications, 
is based on the notion of conditional indepen- 
dence. Roughly speaking, it is assumed that 
the population of interest can be divided into 
non-overlapping subsets of respondents (i.e., 
latent classes) such that, within each subset, the 


categorical variables are independent. Note that 
these subsets are not observed and, indeed, may 
not even be observable. Consider, for example, 
variables A and B that have been observed for a 
sample of respondents. Dependence between A 
and B would mean that the rates of occurrence of 
the various categories of B arenot the same across 
the various categories of A. In particular, for 
two Yes/No attitude items, dependence would 
mean that the Yes response to B was produced 
at different rates for respondents saying Yes 
versus No to A. Such lack of independence is 
more or less universally observed for categorical 
variables and a variety of statistical methods 
has been developed to explore this depen- 
dence (e.g., hierarchical log-linear modeling 
and correspondence analysis). The analytical 
approach taken in LCA is conceptually similar 
to factor analysis in that it is assumed that, if 
latent class membership were known, then the 
variables would be (conditionally) indepen- 
dent. Note that in classical factor analysis it 
is assumed that partialling out the effects of 
the hypothetical (latent) factors reduces the 
intercorrelations among the variables to zero 
(i.e., makes them conditionally independent). 
References that provide historical and theo- 
retical background for LCA include Lazarsfeld 
and Henry (1968), Goodman (1974), Haberman 
(1979), Bartholomew (1987), Hagenaars 
(1990), Von Eye and Clogg (1994), Heinen 


358 Handbook of LongitudindP Rega: https:/jafrilibrary.com 


(1996), Rost and Langeheine (1997), Dayton 
(1999), and Hagenaars and McCutcheon (2002). 

From a mathematical point of view, LCA is 
a method for finite mixture modeling where 
the latent classes represent the components of 
the mixture. Since the variables are categorical 
and, in general, no ordering properties for the 
categories are assumed, each component of the 
mixture is a product of multinomial probabil- 
ity functions. As mixture models have become 
more popular in the behavioral sciences, one 
finds references to latent classes in a broader 
range of contexts such as item response theory 
and structural equation modeling, but these 
applications are not pursued in this chapter. 

Before turning to the mathematical model for 
LCA and the necessary rubrics for estimation, 
model fit, etc., we present a few types of appli- 
cations for LCA to provide some context for 
these methods. 


1. Linear hierarchies: the notion of linear 
hierarchies arises in several research areas 
including Guttman scaling, developmental 
sequences, and learning/acquisition seq- 
uences (Dayton, 1999). Latent class methods 
have been applied in this area since the work 
by Proctor (1970), Goodman (1975), and 
Dayton and Macready (1976; 1980). 

2. Medical diagnosis: latent class methods have 
been used to assess the value of labora- 
tory tests as diagnostic indicators when no 
gold standard exists (e.g., Rindskopf and 
Rindskopf, 1986). 

3. Identifying typologies: latent classes may be 
interpreted as representing clusters of sim- 
ilar respondents within a population and, 
as such, LCA may be viewed as a modern 
approach to cluster analysis (Vermunt and 
Magidson, 2000). 


2 Model for LCA 


For the i respondent, let Y; = {y,;} be responses 
to j = 1,...,J categorical variables. For con- 
venience, the response categories for item j 


are represented by consecutive integers, 
1; Be ccs Ri. Thus, the data may be viewed as a 
J-way contingency table where the total number 
of cells is the product R, - R,---R;. Assuming C 
latent classes, the mathematical model for LCA 
can be represented as: 


c J & 
Pr(¥)= D411 TL ac (1) 
c=1 j=1r=1 
The theoretical proportions of respondents in 
the latent classes are 6, for c =1,...,C with, of 
course, the sum of the proportions being one. 
Also, @,;, is the theoretical conditional probabil- 
ity for response r to variable j given membership 
in latent class c. The terms, 5;,, are indicators 
that allow for the inclusion of appropriate condi- 
tional probabilities based on the responses, i.e.: 


ji iff Vij =r 
y= | 0 otherwise (2) 


Within any specific latent class, c, the probabil- 
ity for a response is: 


TR 
Pr(¥ile) =[] [1 ae (3) 
j=1r=1 
The product of conditional probabilities in 
equation (3) is based on the assumption that 
responses to the variables are independent 
within latent classes. To exemplify this, con- 
sider three dichotomous (1, 2) variables, two 
latent classes and the response {121} for some 
particular respondent. Within latent class 1, 
the theoretical probability for this response 
is the product @,,,;0)2.@,3; and within latent 
class 2 the probability is a@,,a@.,@3,. Then, 
the unconditional probability for this response 
is the weighted sum, 0,0;4,042.4 3, + (1—9,) 
Qy44A922M93, where 0, = 1-—6,. Given the model 
in equation [1], the likelihood for a sample of n 
respondents is 
wee ie os, 
A=J] De. T1 TT cir (4) 
j=1c=1 j=1r=1 


and the logarithm of this likelihood is: 
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n cs R i 
n= Sof a.T1 Tes 6) 


c=1 j=1r=1 


Subject to identification conditions discussed 
below, maximum likelihood estimates for the 
parameters in equation (1) can be derived by 
computing partial derivatives of A with respect 
to the parameters, setting these partial deriva- 
tives to 0 and solving simultaneously. If the 
latent class proportion for the C" class is rewrit- 


C-1 
ten as 0. =1-— >. 9,, the partial derivative 


c=1 
with respect to the latent class proportion for 
class c is: 


SA 
50, 


Cc 


YInPr(¥iJo)—YInPr(¥IC)=0 (6) 


i=1 


Similarly, a partial derivative with respect to 
a conditional probability for an item is of the 
form: 


aa ee a 


SQg5 


Although the partial derivatives in equations 
(6) and (7) are relatively easy to compute, 
they are nonlinear in the parameters and must 
be solved by iterative procedures. Various 
microcomputer programs are available for LCA, 
including LEM (Vermunt, 1997) that is based 
on an estimation-maximization (EM) algorithm, 
Latent Gold (Vermunt and Magidson, 2000) 
that incorporates Bayesian methods and MPlus 
(Muthén and Muthén, 1998) that uses New- 
ton methods. The likelihood equation (4) is 
written on the assumption that the sample of 
respondents is a simple random sample from 
some population. However, for data arising 
from complex survey designs that incorporate 
clusters and sampling weights these methods 
must be modified as described by Patterson, 
Dayton and Graubard (2002). They define a 


pseudo-log-likelihood that incorporates sam- 
pling weights, w, , for each respondent: 


n Cc 
A, => w;ln >> 6, Pr (Y;|c) (8) 
i=1 c=1 


These sampling weights are intended to com- 
pensate for under- or oversampling strata, non- 
response and related factors (see Kish, 1965; 
Kish and Frankel, 1974; Kalton, 1989). As far 
as estimation per se is concerned, incorporat- 
ing sampling weights into LCA is relatively 
straightforward and programs such as MPlus 
and Latent Gold have this capability. More dif- 
ficult issues revolve around obtaining proper 
estimates for standard errors and setting up 
appropriate significance tests for data arising 
from complex surveys. As noted in Patterson, 
Dayton and Graubard (2002), this is still an 
active research area in LCA. 


3 Model fit 


Assessing the fit of a model to categorical data 
often entails a comparison between observed 
frequencies and expected frequencies where the 
latter are derived by substituting maximum- 
likelihood estimates for parameters in the theo- 
retical model. This approach is practical unless 
the number of variables and/or numbers of 
categories for the variables becomes exces- 
sively large and data are sparse (i.e., many 0 
and near-0 cell frequencies occur). There are 
three different chi-square goodness-of-fit statis- 
tics in common use: Pearson, likelihood-ratio, 
and Read-Cressie. All three statistics involve 
observed cell frequencies, F, fort=1...,T, and 
expected frequencies, F, for t=1...,T where 
the number of cells is T = IT R,. Assuming a 
j=1 

total sample size of n, the expectad cell fre- 
quency for a cell with observation Y; is: 


A 


F,=n-P(Y;)iet (9) 


The degrees of freedom for all three chi-square 
goodness-of-fit statistics are equal to: T— p—1, 
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where p is the number of independent para- 
meters estimated when fitting the latent class 
model (e.g., for four dichotomous variables and 
two latent classes, the value of p is 9 and com- 
prises one latent class proportion, four condi- 
tional probabilities for the variables in the first 
latent class, and four more conditional probabil- 
ities for the variables in the second latent class). 
As discussed below, these degrees of freedom 
are based on the assumption that the model 
is identified and unrestricted. For nonidenti- 
fied models, only restricted solutions are pos- 
sible and degrees of freedom must be adjusted 
accordingly. 

The Pearson statistic, X?, is based on the 
arithmetic difference between observed and 
expected frequencies: 


(F,-—F,)? 
F, 


(10) 


E 
X= 
t=1 


The likelihood-ratio statistic, G*? (sometimes 
denoted L’), is based on the natural logarithm 
of ratios of observed and expected frequencies: 


a F 
G? = 2) Fy log, = (11) 


t=1 t 


The Read-Cressie statistic, I?, is a so-called 
power-divergence chi-square test (Read and 
Cressie, 1988) that is intended to be less sensi- 
tive to sparse data than either X? or G’. It con- 
tains an adjustable constant, A, that, when set 
to 0 yields G* and when set to 1 yields X*. The 
value most frequently used for A is 2/3 (e.g., this 
is the value used by the program LEM). 


. ee RY 
Las) (;) i a 


t=1 


In practice, X?, G? and I? are often very similar 
in value but when X? and G? differ substan- 
tially it may be more appropriate to use I* as 
a measure of fit. However, it should be kept in 


mind that sparse data result in major distribu- 
tional disturbances for all of these chi-square 
statistics since they are derived, in theory, from 
asymptotic properties associated with contin- 
gency tables. Also, it should be noted that 
accepting the null hypothesis for a goodness- 
of-fit test merely indicates that discrepancies 
between what is expected on the basis of a 
model and the data are within acceptable limits 
of chance and that many different models may 
provide “good” fit for a given dataset. 

Given the limitations of goodness-of-fit tests 
and the fact that with large sample sizes the 
discrepancies between observed and expected 
frequencies can, subjectively, appear relatively 
small, various descriptive measures have been 
developed for use in LCA. A useful measure 
is the Index of Dissimilarity that ranges from 
0 to 1 and is based on the absolute discrepancies 
between observed and expected frequencies: 


Ne 
Ip = 1(\F, - F,))/(2n) (13) 


In practice, “satisfactory” fit is suggested by val- 
ues of I, less than .05. Also, the index of dis- 
similarity can be used to compare alternative 
models. Another interesting descriptive mea- 
sure is the two-point mixture index of model 
fit, 7*, developed by Rudas, Clogg and Lindsay 
(1994). This index is the minimum propor- 
tion of respondents that must be deleted from 
the dataset in order to achieve perfect fit (i.e., 
G? = 0) for a given model. Although it is easily 
interpretable, there is no simple computational 
approach (see Dayton, 2003 for not-so-simple 
approaches) and is not currently available in 
latent class programs. 

An issue of concern whenever applying 
advanced statistical models to real data is iden- 
tification of the model. An identified model 
yields unique maximum likelihood estimates 
whereas a nonidentified model requires restric- 
tions on the parameters in order to arrive at 
unique estimates. Trivially, one cannot estimate 
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more independent parameters from a dataset 
than there are sufficient statistics for estima- 
tion. In the case of latent class analysis, the 
upper limit on the number of parameters that 
can be estimated is given by the number of 
potential data cells (e.g., 2) for J dichotomous 
variables). Obvious identification issues are 
associated with the usual restrictions on prob- 
abilities; i.e., latent class proportions and con- 
ditional probabilities for variables sum to one 
as indicated above. However, these restrictions 
are very easy to incorporate into parameter 
estimation procedures. There is also the issue 
described as label switching which refers to 
the fact that, for example, interchanging what 
is labeled as the first versus the second latent 
class places the solution in different, albeit sym- 
metric, locations in the parameter space (e.g., 
a split of .6, .4 versus .4, .6 for the latent 
class proportions). This, again, is not a seri- 
ous identification issue but must be kept in 
mind when comparing solutions for two differ- 
ent groups of respondents such as males and 
females. Thus, the solutions may be similar 
but the first latent class estimated for males 
may be the equivalent to the second latent 
class estimated for females. More serious iden- 
tification issues are associated with situations 
in which there is, in fact, no unique solu- 
tion unless explicit restrictions are imposed on 
the estimates. A classic example involves three 
latent classes for four dichotomous variables. 
The number of cells is 2*= 16 and it would 
seem that models with up to 15 parameters 
could be estimated with positive degrees of 
freedom remaining for assessing fit. However, 
an unrestricted three-class model based, appar- 
ently, on 2+3(4) = 14 independent parameters 
is not identified and requires one restriction 
to yield unique estimates. This restriction, for 
example, could involve setting one the condi- 
tional probabilities for a variable to one. There 
is no straightforward method for ascertaining 
whether or not a given latent class model is 
identified. In theory, for an identified model, 


the asymptotic variance—covariance matrix for 
maximum-likelihood estimators is of full rank 
but, in general, this matrix can not be found by 
analytical methods and must be approximated 
numerically. Programs such as LEM compute 
such approximations but may fail to uncover an 
unidentified model. The best advice is to rerun 
the analysis with several different starting val- 
ues for the iterative solution. If identical results 
are found, it is reasonable to assume that the 
model is identified unless there is some consis- 
tent boundary value (e.g., 1) for the conditional 
probabilities of one or more variables. 


4 Model comparisons 


Most applications of LCA entail fitting more 
than one model to a set of data. For nested mod- 
els, with a very important exception as noted 
below, differences in chi-square goodness-of-fit 
tests (i.e., usually G? values) can be used to 
assess the statistical significance of the differ- 
ence in fit with degrees of freedom equal to the 
difference in degrees of freedom for the mod- 
els being compared. Thus, it can be decided 
whether or not the more complex model (i.e., 
the model with more independent parameters 
being estimated) provides better fit to the data 
than the simpler model. Note that the better fit- 
ting model may not actually represent “good” 
fit to the data but is simply the better of the 
models being compared. 

The major exception to the use of difference 
chi-square tests is for the case of models with 
different numbers of latent classes which is, 
unfortunately, one of the cases of major inter- 
est in LCA. If one fits, say, two-class, three- 
class and four-class models to a set of data, 
these models are, in fact, nested. Despite this, 
the differences in X? or G* values are not dis- 
tributed as theoretical chi-square variates. This 
is a general result that applies to comparing 
mixture models based on different numbers of 
components, including mixtures of normal dis- 
tributions, mixtures of Poisson distributions, 
etc. (Titterington, Smith and Makov, 1985). 
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Although the reason for the failure of these 
tests is technically complex, it arises because, 
in moving from the more complex to the less 
complex model, the latent class proportion for 
the more complex model is constrained to zero, 
which is a boundary of the parameter space 
(Bishop, Fienberg and Holland, 1975). 

An alternative for selecting a “best” model 
is based on notions from information theory. 
Akaike (1973; 1974) proposed an estimate for 
the Kullback and Leibler (1951) information 
measure that can be used for ordering alter- 
nate models that are estimated from the same 
data. Akaike interprets Kullback-Leibler infor- 
mation as a measure of the distance between 
the “true” model for data and models actu- 
ally being estimated from the data. His esti- 
mate, AIC, may be viewed as a penalized form 
of the log-likelihood for a mode and is of the 
form AIC = —2Ln(A) + 2p = —2A + 2p. Then, 
the decision-making strategy is to compute the 
AIC statistic for each model under consider- 
ation and select the model with min(AIC) as 
the preferred model among those being com- 
pared. An advantage of the min(AIC) strategy is 
that the models may be nested or non-nested. 
Various related measures have been proposed 
as alternatives to Akaike AIC. In general, these 
measures incorporate different (usually heav- 
ier) penalty terms and include Schwarz (1978) 
BIC with penalty term log,(n)*p and Bozdogan 
(1987) CAIC with penalty term [log,(n)+1]*p. 
In this chapter, we utilize both the Akaike AIC 
and Schwarz BIC for model comparisons but 
suggest that readers consider the merits of alter- 
natives (see Dayton, 1999). 


5 Unconstrained LCA 


One major application of LCA is to look for 
relatively homogeneous latent subgroups of 
respondents. As such, LCA may be viewed as 
a Clustering algorithm that identifies “fuzzy” 
clusters. That is, as shown below, respondents 
may be classified into the latent classes on a 


post hoc basis but the assignment is proba- 
bilistic rather than deterministic. To illustrate 
this approach, we use data for 27,516 respon- 
dents to five survey items dealing with abor- 
tion taken from the General Social Survey (GSS) 
for the years 1972 through 1998. The items, in 
the order presented below, dealt with whether 
or not the respondent would favor allowing a 
woman to receive a legal abortion if: (1) there is 
a strong chance of serious defect in the baby; (2) 
she is married and does not want any more chil- 
dren; (3) the woman’s own health is seriously 
endangered by the pregnancy; (4) she became 
pregnant as a result of rape; and (5) she is not 
married and does not want to marry the man. 
Frequencies for the 32 response patterns are 
displayed in Table 22.1 and latent class pro- 
portions, along with various summary statis- 
tics, based on fitting one to five latent classes 
are summarized in Table 22.2. As often true 
for large sample sizes, the various fit statistics 
present a complex picture that requires careful 
interpretation. First, only the five-class model 
results in a nonsignificant likelihood-ratio chi- 
square statistic and this model also satisfies 
the min(AIC) criterion. Second, the four-class 
model has min(BIC) and both the four- and five- 
class models have very small values for the 
Index of Discrepancy (.0022 and .0004, respec- 
tively). And, third, the latent class proportions 
for the three largest classes for the three- and 
four-class models are very similar. Given the 
small proportion in the final latent class for the 
four-class model, it seems reasonable to inter- 
pret the three-class model with the caveat that 
relatively small additional clusters of respon- 
dents might be reliable and of interest to some 
researchers. A plot of the three-class solution 
(Figure 22.1) shows a distinct pattern for each 
class. The largest class, about 49%, essentially 
agrees that an abortion for any of the five rea- 
sons should be legal. A relatively small class, 
about 13%, opposes an abortion for any of the 
given reasons but with opposition being some- 
what abated for reasons of the mother’s health 
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Table 22.1 Frequencies for five GSS abortion items 


Defect No more Health Rape Single Observed 
frequency 
YES YES YES YES YES 11,212 
YES YES YES YES NO 1265 
YES YES YES NO YES 21 
YES YES YES NO NO 79 
YES YES NO YES YES 36 
YES YES NO YES NO 20 
YES YES NO NO YES 5 
YES YES NO NO NO 4 
YES NO YES YES YES 1471 
YES NO YES YES NO 6859 
YES NO YES NO YES 42 
YES NO YES NO NO 1254 
YES NO NO YES YES 17 
YES NO NO YES NO 169 
YES NO NO NO YES 3 
YES NO NO NO NO 124 
NO YES YES YES YES 68 
NO YES YES YES NO 50 
NO YES YES NO YES 7 
NO YES YES NO NO 18 
NO YES NO YES YES 21 
NO YES NO YES NO 16 
NO YES NO NO YES 2 
NO YES NO NO NO 21 
NO NO YES YES YES 95 
NO NO YES YES NO 1206 
NO NO YES NO YES 7 
NO NO YES NO NO 1081 
NO NO NO YES YES 14 
NO NO NO YES NO 273 
NO NO NO NO YES 4 
NO NO NO NO NO 2052 
27,516 


Table 22.2 Latent classes fitted to five GSS abortion items 


# Latent G? DF p-value LC proportions Ip AIC BIC 
classes 

1 43930.183 26 0.000 1.00 0.4779 145241.9 145283.0 
2 8781.974 20 0.000 .O17, .483 0.1619 110105.7.  110196.2 
3 256.684 14 0.000 .486, .387, .127 0.0060 101592.4 101732.2 
4 25.582 8 0.001 .486, .371, .126, .017 0.0022 101373.3 101562.4 
5 1.820 2 0.400 .484, .312, .127, .062, .014 0.0004 101361.6 101600.0 
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Prob (Yes) 


Abortion item 


Figure 22.1 Three-class solution for GSS abortion 
items 

or rape. Finally, a class representing about 37% 
of respondents has sharply divided opinions, 
being favorable toward reasons of a birth defect, 
mother’s health or rape, but being unfavorable 
toward reasons of not wanting more children or 
being unmarried. It is interesting to note that, by 
combining the latent classes, one could charac- 
terize the sample as favorable to abortion (com- 
bine classes 1 and 2) or unfavorable to abortion 
(combine classes 2 and 3). 

Note that the latent class proportions dis- 
played in Figure 22.1 are consistent with the 
idea that the three latent classes are ordered. 
That is, all conditional probabilities for the first 


class are (equal to or) larger than those for the 
second class and all conditional probabilities 
for the second class are (equal to or) larger than 
those for the third class. This ordering occurred 
naturally given this dataset but order-restricted 
analyses can be imposed using options avail- 
able in LCA programs such as LEM or Latent 
Gold. 

Given maximum likelihood estimates for 
latent class proportions and conditional prob- 
abilities for the variables, Bayes theorem can 
be used to classify respondents into the latent 
classes. In the context of LCA, the theorem takes 
the form 


Pr(clY,;) « 6,-Pr(Y,|c) (14) 


which means that for a given response, Y,, the 
(posterior) probability for latent class c is pro- 
portional to the latent class proportion times the 
likelihood associated with that response. Then, 
classification is carried by assigning a response 
to the latent class for which the posterior prob- 
ability is largest (i.e., the model posterior class). 
Table 22.3 displays the classifications for the 
three class solution based on the five GSS abor- 
tion items. If the frequencies for the modal 
classes (in bold in the table) are summed and 
converted to proportions, they are .510, .360 


Table 22.3 Bayes classifications for five GSS abortion items 


Bayes probabilities 
Defect No more Health Rape Single Observed Class 1 Class2 Class 3 
frequency 
YES YES YES YES YES 11,212 0.998 0.002 0.000 
YES YES YES YES NO 1265 0.683 0.317 0.000 
YES YES YES NO YES 21 0.827 0.173 0.000 
YES YES YES NO NO 79 0.021 0.971 0.008 
YES YES NO YES YES 36 0.989 0.011 0.000 
YES YES NO YES NO 20 0.285 0.704 0.010 
YES YES NO NO YES 5 0.461 0.513 0.026 
YES YES NO NO NO 4 0.002 0.596 0.402 
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Table 22.3. (Continued) 


Bayes probabilities 
Defect No more Health Rape Single Observed Class 1 Class 2 Class 3 
frequency 
YES NO YES YES YES 1471 0.729 0.271 0.000 
YES NO YES YES NO 6859 0.012 0.987 0.001 
YES NO YES NO YES 42 0.027 0.971 0.002 
YES NO YES NO NO 1254 0.000 0.967 0.033 
YES NO NO YES YES 17 0.335 0.662 0.003 
YES NO NO YES NO 169 0.002 0.937 0.061 
YES NO NO NO YES 3 0.004 0.817 0.179 
YES NO NO NO NO 124 0.000 0.254 0.746 
NO YES YES YES YES 68 0.955 0.045 0.000 
NO YES YES YES NO 50 0.087 0.893 0.020 
NO YES YES NO YES 7 0.167 0.774 0.060 
NO YES YES NO NO 18 0.001 0.490 0.510 
NO YES NO YES YES 21 0.777 0.194 0.029 
NO YES NO YES NO 16 0.006 0.332 0.662 
NO YES NO NO YES 2 0.005 0.128 0.867 
NO YES NO NO NO 21 0.000 0.011 0.989 
NO NO YES YES YES 95 0.108 0.886 0.007 
NO NO YES YES NO 1206 0.001 0.909 0.091 
NO NO YES NO YES 7 0.001 0.747 0.252 
NO NO YES NO NO 1081 0.000 0.181 0.819 
NO NO NO YES YES 14 0.014 0.598 0.388 
NO NO NO YES NO 273 0.000 0.103 0.897 
NO NO NO NO YES 4 0.000 0.033 0.967 
NO NO NO NO NO 2052 0.000 0.003 0.998 
27,516 


and .131 which closely correspond to the latent 
class proportions (i.e., .486, .387 and .127). 
There is a relationship between classification 
into the latent classes and the count of Yes 
responses to the GSS abortion items but there 
are notable exceptions. In particular, counts of 
4 and 5 are unique to latent class 1 and counts 
of 0 and 1 are unique to latent class 3, but counts 
of 3 occur in both latent classes 1 and 2 and 
counts of 2 occur in both latent classes 2 and 
3. Thus, in this regard, the latent class analysis 
provides a more highly nuanced interpretation 
of the responses. 


6 Multiple groups LCA 


Comparisons among subgroups within a sam- 
ple are often of interest. Typical comparison 
groups are males/females, age groups, ethnic 
groups, etc. The model for LCA in equation (1) 
can be adapted to accommodate grouping vari- 
ables (Clogg and Goodman, 1984; 1985; Dayton 
and Macready, 2002). We consider only a sin- 
gle grouping variable but the method is easily 
extended to more complex stratifications. Let- 
ting an observation for a respondent in group g 
be Y,, = {Vig}, the model becomes: 


366 Handbook of LongitudindP Rega: https:/jafrilibrary.com 


Cc T&R; 
Sieir 
Pr (Yig) = x Doig Il II ojrlg (15) 
c=1 j=1r=1 
Note that the latent class proportions, Bigs the 
item conditional probabilities, Aejrjgs AS well as 


the indicators, 6;,;, include a subscript for group 
membership. The grouping latent class model 
in equation (15) is referred to as a heterogeneous 
model. In the heterogeneous model, the latent 
classes may or may not have a consistent inter- 
pretation across groups. In typical applications, 
this heterogeneous model is compared to a par- 
tially homogeneous model as well as to a com- 
pletely homogeneous model. For a model with 
G groups, the partially homogeneous model is 
defined by the restrictions 


Sign Sisi 
as — o°* forg=1,...,G (16) 


cir|g ~~ oir 

For this model, the sizes of the latent classes 
are allowed to vary across groups but the condi- 
tional probabilities for the variables that char- 
acterize the nature of the latent classes remain 
the same so that the interpretation of the classes 
can not vary from group to group. On the other 
hand, the homogeneous model is defined by 
both the restrictions in equation (16) and 


64, = 9, forg=1,...,G (17) 
In effect, grouping is ignored when these 


restrictions are applied. Statistical comparisons 
among heterogeneous, partially homogeneous, 


and completely homogeneous models can be 
carried out using difference chi-square tests 
since these models are nested and do not 
involve setting latent class proportion to bound- 
ary values. 

To illustrate these methods, the dataset for 
the five GSS abortion items was divided into 
younger (age 42 and below) versus older (age 43 
and above) age groups. This division reduces 
the sample size to 27,697 because of miss- 
ing data for age and represents about an even 
split of the respondents (53% versus 47%). On 
an item-by-item basis, the younger group uni- 
formly tends to give more yes responses to the 
abortion items. Homogeneous, partial homoge- 
neous, and heterogeneous models are summa- 
rized in Table 22.4. Note that the latent class 
proportions are presented in the same order as 
for the earlier analyses; i.e., the first class is rel- 
atively favorable to abortions for the stated rea- 
sons, the third class is relatively opposed and 
the second class is opposed for what might be 
characterized as nonmedical reasons. Although 
all of the chi-square fit statistics suggest lack of 
model fit, the indices of discrepancy are about 
2% or less for both the partial homogeneous and 
heterogeneous models. If we choose to interpret 
the partial homogeneous model, this is sup- 
ported by the fact that this model satisfies the 
min(BIC) criterion. As shown in Table 22.4, 
the younger group has a higher proportion in 
the favorable class, a lower proportion in the 
unfavorable class, and about the same propor- 
tion in the second class when compared with 
the older group. 


Table 22.4 Grouping latent class models fitted to five GSS abortion items grouped by age LC proportions 


Model CG DF p-value Younger group Older group Ip AIC BIC 

Homogeneous 539.872 45 0.000 .491, .384,.126 .491, .384,.126 0.0426 140067.8 140215.9 

Partial 373.050 43 0.000 .522, .374,.104 .455, .396, .149 0.0209 139902.9 140059.3 
homogeneous 


Heterogeneous 274.179 28 0.000 


.018, .375, .107 .461, .392,.147 0.0063 139834.1 140113.8 
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7 Scaling models 


One of the first applications of restricted latent 
class models involved assessing the fit of linear 
(Guttman) scales to attitude rating items. These 
same models have applications to learning hier- 
archies and to other outcomes that might be 
expected to occur in some specific sequence over 
time periods (e.g., developmental tasks). Proctor 
(1970), Goodman (1974; 1975) and Dayton and 
Macready (1976), among others, presented mod- 
els in which each scale type was represented by 
a separate latent class and the conditional prob- 
abilities for the variables (e.g., items) were suit- 
ably restricted to definea hierarchic or sequential 
structure. This chapter provides some basic con- 
cepts on this topic; more complete coverage can 
be found in Dayton (1999). 

Consider, for example, three attitude items 
representing increasing degrees of positive 
opinion on some social topic. If responses 
(Y=yes, N=no) to these items conformed to 
a linear scale, the permissible response pat- 
terns would be NNN, YNN, YYN, YYY, whereas 
a response like NYN would be inconsistent 
with a linear scale. In practice, of course, such 
inconsistent responses do occur and the issue 
is whether or not the linear scale is a reason- 
able approximation for observed data. Latent 
class scaling models assume that each per- 
missible response pattern is associated with a 
latent class, so for three items, as above, there 
would be four latent classes. In order to iden- 
tify a latent class with a specific permissi- 
ble response pattern, restrictions are imposed 
on the conditional probabilities for the items. 
A very simple model proposed by Proctor 
(1970) assumes response errors at a constant 
rate across items. Thus, respondents in the 
latent class corresponding to the pattern NNN 
may, in fact, show any of the other seven pos- 
sible response patterns if one, two, or three 
response errors are made. Of course, a respon- 
dent would give the NNN response if no error 
were made. In this way, respondents in any of 


the four latent classes representing the four per- 
missible response patterns may give any possi- 
ble response with greater or lesser probability. 
The formal restrictions on the item conditional 
probabilities are: 


He = yy, = Ay 01 = 131 = Ao01 = A931 = A331 


= Ap12 = A312 = A322 = Aa12 = Aq22 = Ag32 


Note that a, can be viewed as a constant 
error rate that, depending on the permissible 
response pattern, corresponds to a response of Y 
occurring when the permissible response is N or 
a response of N occurring when the permissible 
response is Y. Thus, for example, if a respon- 
dent comes from the latent class corresponding 
to YNN, the response NNN has probability 
a,(1—a,)* since there is one response error 
and two non-error responses. Similarly, the 
response YNN has probability (1—@,)* repre- 
senting three non-errors and the response NYY 
has probability a? representing three response 
errors. 

The Proctor model was extended by Dayton 
and Macready (1976) to include different types 
of response errors. The intrusion-omission error 
model posits two types of errors corresponding 
to Y replacing N (intrusion) and N replacing Y 
(omission). The formal restrictions on the item 
conditional probabilities are: 


Oy = Oyy, = Ayy1 = Ay 31 = Ay21 = Ag, = M33, and 


Ao = A212 = A312 = A399 = Agi = Agn2 = Agg30 


Other error models include item specific errors: 


Aer = Aq11 = Ag12 = A312 = Aa12; 


eg = Ayn, = Ayg1 = Az29 = Agy2; and 


We3 = A131 = Ag, = W331 = Aq32 

as well as latent-class specific errors: 
Oey = Aq = C21 = 81315 Aen = A221 = 
Qlag1 = Ay47} Aog = A331 = A312 = Ayan; and 


Meg = Aqy2 = Ayn2 = Ag30 
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Although the GSS abortion items were nei- 
ther designed nor selected to be consistent with 
a linear scale, we explore some scaling mod- 
els to how they compare in fit with the unre- 
stricted models fitted earlier. Based on purely 
subjective judgments gleaned from inspecting 
Figure 22.1, the item ordering 3, 4, 1, 5, 2 was 
selected. Thus, the permissible response pat- 
terns are: NNNNN, NNYNN, NNYYN, YNYYN, 
YNYYY and YYYYY. Table 22.5 summarizes 
results from fitting four error models. The best 
fitting, among these generally poor fitting mod- 
els, is the item-specific error model that meets 
both the min(AIC) and min(BIC) criteria, as 
well as having the most acceptable Ip value. 
Note, however, that these values, as well as 
the G’ statistic, are not nearly as acceptable 
as those found for an unrestricted three-class 
model (Table 22.1). The estimated latent class 
proportions for the permissible response pat- 
terns in the order above are: .090, .040, .046, 
.339, .021 and .464. The first and fifth classes 
correspond to the extreme groups for the unre- 
stricted three-class model and show similar 
latent class proportions to that solution. Note 
that the estimated error rates for two of the five 
classes for the latent-class specific error model 
went to boundary values of 0, which suggests 
identification issues for this model. 


Table 22.5 Linear scale fitted to five GSS abortion items 


8 Covariate LCA 


Although grouping variables provides a useful 
approach to incorporating additional manifest 
variables into LCA, it has definite limitations. 
First, ifan outside variable is continuous, rather 
than categorical (e.g., age of respondent), then it 
is necessary to create groups using some more- 
or-less arbitrary cut-values as was illustrated 
above. Second, only a relatively few grouping 
variables can be accommodated in an analy- 
sis. If large numbers of grouping variables are 
used, the cell frequencies become small, result- 
ing in unstable subgroup analyses. And, third, 
LCA with grouping variables can be extremely 
complex in terms of the number of parame- 
ters that are estimated. Dayton and Macready 
(1988) proposed covariate LCA in which a logis- 
tic regression model is written for the relation- 
ship between latent class membership and one 
or more covariates (actually, they proposed a 
more general model, but the logistic model is 
most widely used in practice). The covariates 
can be continuous, categorical using recoding 
when necessary (e.g., dummy-variable coding), 
and/or products of covariates in order to model 
interactions. In effect, all of the modeling pos- 
sibilities available in ordinary multiple regres- 
sion and logistic multiple regression become 
available in the context of LCA. 


Error model CG DF _ p-value Error rates Ip AIC BIC 

Proctor 4202.110 25 0.000 0.040 0.095 105515.9 105565.2 

Intrusion- 3869.276 24 0.000 .082, .024 0.087 105185.0 105242.6 
omission 

Item specific 3073.560 21 0.000 = .015, .039, .013, .067, .062 0.078 104395.3 104477.5 

Latent class 6980.709 20 0.000 .004, .195, .000*, .077, .000*, 004 0.099 108304.5 108394.9 

specific 


*Conditional probability at boundary value; identification issues 
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Although covariate LCA can be generalized 
as shown below, we begin by assuming that 
there are just two latent classes and a single 
covariate, Z. The model for the regression of 
latent class membership on the covariate is: 


ePot Pi Z; 
A112, = 8(4,|B) = rae ees (18) 


In log-odds form, the model can be written: 


9112, 
In (=) = Bot+ BZ; (19) 


Note that this is exactly the same model 
posited in ordinary logistic regression analysis 
except that membership in the latent classes 
is unknown rather than manifest. Combining 
the model in equation (18) with the latent class 
model for two latent classes yields: 


2 y OR 
Pr(¥,|Z;) = >> az, I] TL oat (20) 
c=1 


j=1r=1 


Note that the probability of latent class mem- 
bership is dependent on the covariate but that 
conditional probabilities for the variables are 
independent of the variables. In the terminol- 
ogy of models with grouping variables, this 
is a partially homogeneous model. That is, 
the latent structure defined by the conditional 
probabilities for the variables is assumed to 
be constant over the different values for the 
covariate. 

Covariate LCA can accommodate multiple 
covariates and cases with three or more latent 
classes. For multiple covariates, the obvious 
modification is to expand the additive model 
for log-odds: 


417, 
i 1Z; =B,+B,Z,;+---+8,Z,; (21) 
1— 642, 


where Z, is a vector of p covariates. One 
approach to extending the model to more than 


two latent classes is to select one of the classes 
as a reference (usually the last class, C) and 
then create log-odds models comparing each of 
the remaining classes with the reference class. 
Using this coding, by default, the logistic regres- 
sion coefficients for the reference class are each 
equal to 0. Then there are C-1 log-odds models 
analogous to equation (21) where the logistic 
regression models for classes c=1...,C-1 are of 
the form: 


ePoct BicZi 
9.42, = 8(Z;|B) = = 4 (22) 
1+ ¥2 eBoct Pied 
c=1 
and for class C is: 
1 
9¢\z, = 8(Z;|B) = i =@uar (23) 
1+ ye ePoct+BicZi 
c=1 


For illustration, we return to the five GSS 
abortion items where a_ three-class model 
provided reasonable fit. Using age as a contin- 
uous covariate, the estimated conditional prob- 
abilities for the five items were very similar to 
those shown in Figure 22.1 and are not sum- 
marized here. Thus, the structure of reported 
abortion attitudes is essentially the same as 
found from the LCA without age as a covariate. 
The logistic regression coefficients for the first 
two latent class proportions were estimated as 
Bo. = 2.084, B,, = —.016 and By. = 1.639, B,, = 
—.011. Figure 22.2 displays the distinctive pat- 
terns for the three latent classes. With increas- 
ing age, the expected percentage in the first 
latent class that is relatively favorable to abor- 
tion declines steadily from about 55% to about 
40%. On the other hand, the third latent 
class that is relatively unfavorable to abortion 
increases from less than 10% to more than 20% 
with increasing age. The third latent class that 
tends to oppose abortion for nonmedical rea- 
sons is relatively stable over the range of ages 
in the GSS database. 
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Figure 22.2 GSS abortion item cluster profiles 
with age as covariate 


9 Software notes 


All analyses reported in this paper were run 
with Latent Gold (Vermunt and Magidson, 
2000) from datasets in SPSS file. All Bayes con- 
stants were set to 0 which results in maximum 
likelihood estimates rather than posterior-mode 
Bayes estimates. Equivalent analyses could 
have been generated using LEM (Vermunt, 
1997) but this could have required the cre- 
ation of new datasets since inputting SPSS 
datasets is not an option with LEM. The GSS 
abortion items were taken from public-access 
databases maintained by the National Opinion 
Research Center in Chicago, Illinois, at the web- 
site http://webapp.icpsr.umich.edu/GSS/. 


References 


Akaike, H. (1973). Information theory and an exten- 
sion of the maximum likelihood principle. In 


B. N. Petrov and F. Csake (eds), Second Inter- 
national Symposium on Information Theory, pp. 
267-281. Budapest: Akademiai Kiado. 

Akaike, H. (1974). A new look at the statistical model 
identification. IEEE Transactions on Automatic 
Control, AC-19: 716-723. 

Bartholomew, D. J. (1987). Latent Variable Models 
and Factor Analysis. London: Charles Griffin & Co. 

Bishop, Y., Fienberg, S. and Holland, P. (1975) Dis- 
crete Multivariate Analysis. MIT Press. 

Bozdogan, H. (1987). Model-selection and Akaike’s 
information criterion (AIC): The general theory 
and its analytical extensions. Psychometrika, 52: 
345-370. 

Clogg, C. C. and Goodman, L. A. (1984). Latent struc- 
ture analysis of a set of multidimensional tables. 
Journal of the American Statistical Association, 79: 
762-771. 

Clogg, C. C. and Goodman, L. A. (1985). Simulta- 
neous latent structure analysis in several groups. 
In N. B. Tuma (ed.), Sociological Methodology. 
San Francisco: Jossey-Bass. 

Dayton, C. M. (1999). Latent Class Scaling Analysis. 
Quantitative Applications in the Social Sciences 
Series No. 126. Thousand Oaks, CA: Sage. 

Dayton, C. M. (2003). Applications and computa- 
tional strategies for the two-point mixture index of 
fit. British Journal of Mathematical and Statistical 
Psychology, 56: 1-13. 

Dayton, C. M. and Macready, G. B. (1976). A proba- 
bilistic model for validation of behavioral hierar- 
chies. Psychometrika, 41: 189-204. 

Dayton, C. M. and Macready, G. B. (1980). A 
scaling model with response errors and intrinsi- 
cally unscalable respondents. Psychometrika, 45: 
343-356. 

Dayton, C. M. and Macready, G. B. (1988). 
Concomitant-variable latent class models. Jour- 
nal of the American Statistical Association, 83: 
173-178. 

Dayton, C. M. and Macready, G. B. (2002). Use of cat- 
egorical and continuous covariates in latent class 
analysis. In Allan McCutcheon and Jacques Hage- 
naars (eds), Advances in Latent Class Modeling. 
Cambridge, UK: Cambridge University Press. 

Goodman, L. A. (1974). Exploratory latent structure 
analysis using both identifiable and unidentifiable 
models, Biometrika, 61: 215-231. 

Goodman, L. A. (1975). A new model for scaling 
response patterns: An application of the quasi- 
independence concept. Journal of the American 
Statistical Association, 70: 755—768. 

Haberman, S. J. (1979). Analysis of Quantitative 
Data, Vol. 2. New York: Academic Press. 


Presented by: https:/afrilibrary.com 4 introduction to latent class analysis 371 


Hagenaars, J. A. (1990). Categorical Longitudinal 
Data. Newbury Park: Sage. 

Hagenaars, J. A. and McCutcheon, A. L. (eds) (2002). 
Applied Latent Class Analysis. Cambridge, UK: 
Cambridge University Press. 

Heinen, T. (1996). Latent Class and Discrete Trait 
Models. Advanced Quantitative Techniques in the 
Social Sciences Series 6. Thousand Oaks: Sage. 

Kalton, G. (1989). Modeling considerations: Dis- 
cussion from a survey sampling perspective. In 
D. Kasprzyk, G. Duncan, G. Kalton and M. P. 
Singh (eds), Panel Survey, pp. 575-585. New York: 
Wiley. 

Kish, L. (1965). Survey Sampling. New York: Wiley. 

Kish, L. and Frankel, M. P. (1974). Inference from 
complex samples. Journal of the Royal Statistical 
Society, Series B, 36: 1-37. 

Kullback, S. and Leibler, R. A. (1951). On information 
and sufficiency. Annals of Mathematical Statistics, 
22: 79-86. 

Lazarsfeld, P. F. and Henry, N. W. (1968). Latent 
Structure Analysis. Boston: Houghton Mifflin. 

Muthén, L. K. and Muthén, B. O. (1998). Mplus: 
The Comprehensive Modeling Program for Applied 
Researchers, User’s Guide. Los Angeles, CA: 
Muthén & Muthén. 

Patterson, B., Dayton, C. M. and Graubard, B. 
(2002). Latent class analysis of complex survey 
data: Application to dietary data. Journal of the 
American Statistical Association, 97: 721—729. 

Proctor, C. H. (1970). A probabilistic formulation and 
statistical analysis of Guttman scaling. Psychome- 
trika, 35: 73-78. 


Read, T. R. C. and Cressie, N. A. C. (1988). Goodness- 
of-Fit Statistics for Discrete Multivariate Data. New 
York: Springer-Verlag. 

Rindskopf, R. and Rindskopf, W. (1986). The value of 
latent class analysis in medical diagnosis. Statis- 
tics in Medicine, 5: 21-27. 

Rost, J. and Langeheine, R. (eds) (1997). Appli- 
cations of Latent Trait and Latent Class 
Models in the Social Sciences. New York: 
Waxmann. 

Rudas, T., Clogg, C. C. and Lindsay, B. G. (1994). 
A new index of fit based on mixture methods 
for the analysis of contingency tables. Journal 
of the Royal Statistical Society, Series B, 56: 
623-639. 

Schwarz, G. (1978). Estimating the dimension of a 
model. Annals of Statistics, 6: 461-464. 

Spearman, C. E. (1904). “General intelligence” objec- 
tively determined and measured. American Jour- 
nal of Psychology, 5: 201-293. 

Titterington, D. M., Smith, A. F. M. and Makov, U. E. 
(1985). Statistical Analysis of Finite Mixture Mod- 
els. New York: Wiley. 

Vermunt, J. K. (1997), The LEM user manual. WORC 
Paper. Tilburg University, The Netherlands. 

Vermunt, J. K. and Magidson, J. (2000). Latent 
class cluster analysis. In J. A. Hagenaars and 
A. L. McCutcheon (eds), Advances in Latent Class 
Models. Cambridge, UK: Cambridge University 
Press. 

Von Eye, A. and Clogg, C. C. (eds) (1994). Latent 
Variables Analysis: Applications for Developmen- 
tal Research. Thousand Oaks: Sage. 


Presented by: https://jafrilibrary.com 


This page intentionally left blank 


Presented by: https://jafrilibrary.com 


| Chapter 23 i 


Latent class models in longitudinal 
research 
Jeroen K. Vermunt, Bac Tran and Jay Magidson 


1 Introduction 


This article presents a general framework for 
the analysis of discrete-time longitudinal data 
using latent class models. The encompassing 
model is the mixture latent Markov model, 
a latent class model with time-constant and 
time-varying discrete latent variables. The time- 
constant latent variables are used to deal with 
unobserved heterogeneity in the change pro- 
cess, whereas the time-varying discrete latent 
variables are used to correct for measurement 
error in the observed responses. By allowing for 
direct relationships between the latent states at 
consecutive time points, one obtains the typi- 
cal Markovian transition or first-order autore- 
gressive correlation structure. Moreover, each 
of three distinct submodels can include covari- 
ates, thus addressing separate important issues 
in longitudinal data analysis: observed and 
unobserved individual differences, autocorrela- 
tion, and spurious observed change resulting 
from measurement error. 

Itis shown that most of the existing latent class 
models for longitudinal data are restricted spe- 
cial cases of the mixture latent Markov model 
presented, which itself is an expanded version 
with covariates of the mixed Markov latent class 
model by Van de Pol and Langeheine (1990). 
The most relevant restricted special cases are 


mover-stayer models (Goodman, 1961), mix- 
ture Markov models (Poulsen, 1982), latent 
(or hidden) Markov models (Baum et al., 
1970; Collins and Wugalter, 1992; Van de Pol 
and De Leeuw, 1986; Vermunt, Langeheine 
and Bockenholt, 1999; Wiggins, 1973), mixture 
growth models (Nagin, 1999; Muthén, 2004; 
Vermunt, 2006), and mixture latent growth 
models (Vermunt, 2003; 2006) for repeated 
measures, as well as the standard multiple- 
group latent class model for analyzing data 
from repeated cross-sections (Hagenaars, 1990). 

The next section presents the mixture latent 
Markov model. Then we discuss its most 
important special cases and illustrate these 
with an empirical example. We end with a 
short discussion of various possible exten- 
sions of our approach. The first appendix pro- 
vides details on parameter estimation using the 
Baum-Welch algorithm. The second appendix 
contains model setups for the syntax version 
of the Latent GOLD program (Vermunt and 
Magidson, 2005) that was used for estimating 
the example models. 


2 The mixture latent Markov model 


Assume that we have a longitudinal data set 
containing measurements for N subjects at T+1 
occasions. The mixture latent Markov model 
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is a model containing five types of variables: 
response variables, time-constant explanatory 
variables, time-varying explanatory variables, 
time-constant discrete latent variables, and 
time-varying discrete latent variables. For sim- 
plicity of exposition, we will assume that 
response variables are categorical, and that 
there is at most one time-constant and one time- 
varying latent variable. These are, however, 
not limitations of the framework we present 
which can be used with continuous response 
variables, multiple time-constant latent vari- 
ables, and multiple time-varying latent vari- 
ables. Our mixture latent Markov model is an 
expanded version of the mixed Markov latent 
class model proposed by Van de Pol and Lange- 
heine (1990): it contains time-constant and 
time-varying covariates and it can be used when 
the number of time points is large. 

Let y;,; denote the response of subject i at 
occasion ¢t on response variable j, where 1 <i< 
N,O0<t<T,1<j<jJ, and 1 < Vix, < Mj. Note 
that J is the total number of response variables 
and M, the number of categories for response 
variable j. The vector of responses for subject 
i at occasion t is denoted as y;, and the vector 
of responses at all occasions as y,;. The vec- 
tor of time-constant and time-varying predictors 
at occasion ¢ is denoted by z, and z;, respec- 
tively. The time-constant and time-varying dis- 
crete latent variables are denoted by w and x,, 
where 1<w<L and 1 <x, <K. The latter 
implies that the number of categories of the two 
types of latent variables equal L and K, respec- 
tively. To make the distinction between the two 
types of latent variables clear, we will refer to 
w as a latent class and to x, as a latent state. 

The general model that we use as the starting 
point is the following mixture latent Markov 
model: 


L K XK K 
P(y;|z;) = > » a tee x P(W, Xp, X1,.++,X7|Z;) 
xXrp=1 


Ww=1xXp=1X,=1 T 


x Ply;|W, Xo, X1, viney Mp Zz) (1) 


with 
P(W, Xp, Xy5---5Xp|Z;) = P(w|z;) 
T 
x P(X9|W, Zio) Il P(X,|X1_1, W, Zit) (2) 


t=1 


T 
P(y;|W, Xo, X15... ,Xp,Z;) = [] Pil, w. 2:1) 


t=0 


T J 
= IT [] PWijlx w, zi) (3) 


t=0j=1 


As many statistical models, the model in equa- 
tion (1) describes P(y;|z;), the (probability) 
density associated with responses of subject 
i conditional on his/her observed covariate 
values. The right-hand side of this equation 
shows that we are dealing with a mixture 
model containing 1 time constant latent vari- 
able and T+1 time-varying latent variables. 
The total number of mixture components (or 
latent classes) equals L-K7+!, which is the 
product of the number of categories of w and 
x, for t=0,1,2,...,T. As in any mixture 
model, P(y;|z;) is obtained as a weighted aver- 
age of class-specific probability densities — here 
P(y,|W, Xo.X1,...,X,Z,;) — where the (prior) class 
membership probabilities or mixture propor- 
tions — here P(w,x,,X,,...,X;7|Z;) — serve as 
weights (Everitt and Hand, 1981; McLachlan 
and Peel, 2000). 

Equations (2) and (3) show the specific 
structure assumed for the mixture proportion 


P(w, Xp, X,,.-.,X7|z;) and the class-specific den- 
sities P(y;|w, X),X,,...,X7,Z;). The equation for 
P(w,X),X,,.-.,X7|Z;) assumes that conditional 


on w and z,, x, is associated only with x,_, 
and x,,, and thus not with the states occu- 
pied at the other time points — the well-known 
first-order Markov assumption. The equation 
for P(y;|w, Xo,X,,---5X;,Z;) makes two assump- 
tions: (1) conditionally on w, x,, and z,,, the J 
responses at occasion t are independent of the 
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latent states and the responses at other time 
points, and (2) conditionally on w, x,, and z,,, 
the J responses at occasion time point ¢ are 
mutually independent, which is referred to as 
the local independence assumption in latent 
class analysis (Goodman, 1974). 

As can be seen from equations (2) and (3), the 
models of interest contain four different kinds 
of model probabilities: 


= P(w|z,) is the probability of belonging to a 
particular latent class conditional on a per- 
son’s covariate values. 

= P(x,|W,Z,9) is an initial-state probability; i.e., 
the probability of having a particular latent 
initial state conditional on an individual’s 
class membership and covariate values at 
t=0. 

= P(xX,|X,;_,,W,Z;,) is a latent transition probabil- 
ity; ie., the probability of being in a partic- 
ular latent state at time point f conditional 
on the latent state at time point t—1, class 
membership, and time-varying covariate 
values. 

= P(Yi9|X;,W,Z,) is a response probability, 
which is the probability of having a particu- 
lar observed value on response variable j at 
time point t conditional on the latent state 
occupied at time point t, class membership 
w, and time-varying covariate values. 


Typically, these four sets of probabilities will 
be parameterized and restricted by means of 
(logistic) regression models. This is especially 
useful when a model contains covariates, where 
time itself may be one of the time-varying 
covariates of main interest. In the empirical 
application presented below we will use such 
regression models. For extended discussions on 
logistic regression analysis, we refer to intro- 
ductory texts on this topic (see, for example, 
Agresti, 2002; Menard, 2002; Vermunt, 1997). 

The three key elements of the mixture latent 
Markov model described in equations (1), (2), 
and (3) are that it can take into account (1) 


unobserved heterogeneity, (2) autocorrelation, 
and (3) measurement error. Unobserved het- 
erogeneity is captured by the time-constant 
latent variable w, autocorrelations are captured 
by the first-order Markov transition process in 
which the state at time point tf may depend 
on the state at time point f—1, and measure- 
ment error or misclassification is accounted for 
allowing an imperfect relationship between the 
time-specific latent states x, and the observed 
responses y;;. Note that these are three of 
the main elements that should be taken into 
account in the analysis of longitudinal data; 
i.e., the interindividual variability in patterns 
of change, the tendency to stay in the same 
state between consecutive occasions, and spu- 
rious change resulting from measurement error 
in observed responses. 

Parameters of the mixture latent Markov 
model can be estimated by means of maximum 
likelihood (ML). For that purpose, it is advis- 
able to use a special variant of the expec- 
tation maximization (EM) algorithm that is 
usually referred to as the forward-backward 
or Baum-Welch algorithm (Baum et al., 1970; 
McDonald and Zucchini, 1997) which is 
described in detail in the first appendix. 
This special algorithm is needed because our 
model contains a potentially huge number of 
entries in the joint posterior latent distribution 
P(w, Xo, X1;--+»X7lY;,Z;), except in cases where 
T, L and K are all small. For example, in a 
fairly moderate sized situation where T = 10, 
L=2 and K = 3, the number of entries in the 
joint posterior distribution already equals 2- 
3'! = 354294, a number which is impossible 
to process and store for all N subjects as has 
to be done within standard EM. The Baum- 
Welch algorithm circumvents the computating 
of this joint posterior distribution making use of 
the conditional independencies implied by the 
model. Vermunt (2003) proposed a slightly sim- 
plified version of the Baum-Welch algorithm for 
dealing with the multilevel latent class model, 
which when used for longitudinal data analysis 
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is one of the special cases of the mixture 
latent Markov model described in the next 
section. 

A common phenomenon in the analysis of 
longitudinal data is the occurrence of missing 
data. Subjects may have missing values either 
because they refused to participate at some 
occasions or because it is elected by the study 
design. A nice feature of the approach described 
here is that it can easily accommodate miss- 
ing data in the ML estimation of the unknown 
model parameter. Let 6,, be an indicator vari- 
able taking on the value 1 if subject i provides 
information for occasion ¢ and 0 if this informa- 
tion is missing. The only required change with 
missing data is the following modification of 
equation (3): 


T 
Xp Zi) = Il [PUY itl XW, Zi : 


t=0 


Ply;|W, Xo. X1,-- 


For 6; = 1, nothing changes compared to what 
we had before. However, for 6, =0, the time- 
specific conditional density becomes 1, which 
means that the responses of a time point with 
missing values are skipped. Actually, for each 
pattern of missing data, we have a mixture 
latent Markov for a different set of occasions. 
Two limitations of the ML estimation procedure 
with missing values should be mentioned: (1) it 
can deal with missing values on response vari- 
ables, but not with missing values on covari- 
ates, and (2) it assumes that the missing data 
are missing at random (MAR). The first limita- 
tion may be problematic when there are time- 
varying covariates for which the values are also 
missing. However, in various special cases dis- 
cussed below — the ones that do not use a 
transition structure — it is not a problem if 
time-varying covariates are missing for the time 
points in which the responses are missing. The 
second limitation concerns the assumed miss- 
ing data mechanism: MAR is the least restrictive 
mechanism under which ML estimation can be 
used without the need of specifying the exact 


mechanism causing the missing data; i.e., under 
which the missing data mechanism is ignor- 
able for likelihood-based inference (Little and 
Rubin, 1987; Schafer, 1997). It is possible to 
relax the MAR assumption by explicitly defin- 
ing a not-missing-at-random (NMAR) mecha- 
nism as a part of the model to be estimated (Fay, 
1986; Vermunt, 1997). 

An issue strongly related to missing data is 
the one of unequally spaced measurement occa- 
sions. As long as the model parameters defin- 
ing the transition probability are assumed to 
be occasion specific, no special arrangements 
are needed. If this is not the case, unequally 
spaced measurements can be handled by defin- 
ing a grid of equally spaced time points con- 
taining all measurement occasions. Using this 
technique, the information on the extraneous 
occasions can be treated as missing data for all 
subjects. An alternative is to use a continuous- 
time rather than a discrete time framework 
(Bockenholt, 2005), which can be seen as 
the limiting case in which the elapsed time 
between consecutive time points in the grid 
approaches zero. 

Another issue related to missing data is the 
choice of the time variable and the correspond- 
ing starting point of the process. The most com- 
mon approach is to use calender time as the 
time variable and the first measurement occa- 
sion as t= 0, but one may, for example, also 
use age as the relevant time variable, as we do 
in the empirical example. Although children’s 
ages at the first measurement vary between 11 
and 17, we use age 11 as t= 0. This implies that 
for a child that is 12 years of age information 
at t= 0 is treated as missing, for a child that is 
13 years of age information a t=0 and t=1 is 
treated as missing, etc. 


3 The most important special cases 


Table 23.1 lists the various special cases that 
can be derived from the mixture latent Markov 
model defined in equations (1)-(3) by assum- 
ing that one or more of its three elements — 
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Table 23.1 Classification of latent class models for longitudinal research 


Model name Transition Unobserved Measurement 
structure heterogeneity error 

I Mixture latent Markov yes yes yes 

I Mixture Markov yes yes no 
Il Latent Markov yes no yes 
IV Standard Markov* yes no no 
V Mixture latent growth no yes yes 
VI Mixture growth no yes no 
VII Standard latent class no no yes 
VI Independence* no no no 


*This model is not a latent class model. 


transition structure, measurement error, and 
unobserved heterogeneity — is not present or 
needs to be ignored because the data is not 
informative enough to deal with it. Models I-III 
and V—-VII are latent class models, but IV and 
VIII are not. Model VII differs from models I-VI 
in that it is a model for repeated cross-sectional 
data rather than a model for panel data. Below 
we describe the various special cases in more 
detail. 


3.1 Mixture latent Markov 


First of all, it is possible to define simpler ver- 
sions of the mixture latent Markov model itself. 
Actually, the mixed Markov latent class model 
proposed by Van de Pol and Langeheine (1990) 
which served as an inspiration for our model 
is the special case of our model when nei- 
ther time-constant nor time-varying covariates 
are present. Van de Pol and Langeheine (1990) 
also proposed a variant in which the four types 
of model probabilities could differ across cat- 
egories of a grouping variable (see also Lange- 
heine and Van de Pol, 2002). A similar model is 
obtained by replacing the z; and z,, in equations 
(1)-(3) by a single categorical covariate z;,. 


3.2. Mixture Markov 


The mixture Markov model (Poulsen, 1982) is 
the special case of the model presented in equa- 


tions (1)-(3) when there is a single response 
variable that is assumed to be measured with- 
out error. The model is obtained by replac- 
ing the more general definition in equation 
(3) with 


T 
P(Y;|W, Xo, Xq,-+-5Xp,Z;) = [] Piel) 


t=0 


where K = M and P(y,|x,;) = 1 if x, = y, and 
0 otherwise. The product over the multiple 
response variables and the index j can be omit- 
ted because J =1 and y,, is assumed not to 
depend on wand z,, but only on x,. For this spe- 
cial case the number of latent states (K) is equal 
to the number of observed states (M) and the 
relationship between x, and y,, is perfect, which 
indicates that x, is measured without error. 

A special case of this mixture Markov model 
is the mover-stayer model (Goodman, 1961). 
This model assumes that L = 2 and that the tran- 
sition probabilities are fixed to 0 for one class, 
say for w = 2. Members of this class, for which 
P(x,|X,_,,W = 2,zZ;,) =1 if x, = x,_, and 0 other- 
wise, are called stayers. Note that the mover- 
stayer constraint can not only be imposed in the 
mixture Markov but also in the mixture latent 
Markov, in which case transitions across imper- 
fectly measured states are assumed not to occur 
in the stayer class. 
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Because of the perfect match between x, and 
Yi, the mixture Markov model can also be 
defined without latent states x,; i-e., as: 


L 
P(y,|Z;) = ye P(w|z;) P(Vjo|W, Z;) 


w=1 


T 
x Il PUVit|Vit-1> W% Zit) 


t=1 


3.3. Latent Markov model 


The latent Markov, latent transition, or hid- 
den Markov model (Baum et al., 1970; Collins 
and Wugalter, 1992; Van de Pol and De Leeuw, 
1996; Vermunt, Langeheine and Béckenholt, 
1999; Wiggins, 1973) is the special case of 
the mixture latent Markov that is obtained by 
eliminating the time-constant latent variable w 
from the model, i.e., by assuming that there 
is no unobserved heterogeneity or that it can 
be ignored. The latent Markov model can be 
obtained without modifying the formulae, but 
by simply assuming that L = 1; i.e., that all sub- 
jects belong to the same latent class. 

The latent Markov model yields estimates 
for the initial-state and transition probabili- 
ties, as well as for how these are affected by 
covariate values, while correcting for measure- 
ment error in the observed states. The model 
can be applied with a single or with multiple 
response variables. When applied with a single 
categorical response variable, one will typically 
assume that the number of latent states equals 
the number or categories of the response vari- 
able: K = M. Moreover, model restrictions are 
required to obtain an identified model, the most 
common of which are time-homogeneous tran- 
sition probabilities or time-homogeneous mis- 
classification probabilities. 

When used with multiple indicators, the 
model is a longitudinal data extension of the 
standard latent class model (Hagenaars, 1990). 
The time-specific latent states can be seen as 
clusters or types which differ in their responses 


on the J indicators, and the Markovian transi- 
tion structure is used to describe and predict 
changes that may occur across adjacent mea- 
surement occasions. 


3.4 Markov model 


By assuming both perfect measurement as in the 
mixture Markov model and absence of unob- 
served heterogeneity as in the latent Markov 
model, one obtains a standard Markov model, 
which is no longer a latent class model. This 
model can further serve as a simple starting 
point for longitudinal applications with a single 
response variable, where one wishes to assume 
a Markov structure. It provides a baseline for 
comparison to the three more extended models 
discussed above. Use of these more extended 
models makes sense only if they provide a sig- 
nificantly better description of the data than the 
simple Markov model. 


3.5 Mixture latent growth model 


Now we turn to latent class models for 
longitudinal research that are not transition 
or Markov models. These mixture growth 
models are nonparametric random-effects mod- 
els (Aitkin, 1999; Skrondal and Rabe-Hesketh, 
2004; Vermunt and Van Dijk, 2001) for lon- 
gitudinal data that assume that dependencies 
between measurement occasions can be cap- 
tured by the time-constant latent variable w. 
The most extended variant is the mixture latent 
growth model, which is obtained from the 
mixture latent Markov model by imposing the 
constraint P(x,|x;_1,W,Z,) = P(x,|w,z;,). This is 
achieved by replacing equation (2) with 


T 
P(W, Xp, X15. +» Xp|Z;) = P(w|z;) [| P(x,|w, z;,). 


t=0 


This model is a variant for longitudinal data of 
the multilevel latent class model proposed by 
Vermunt (2003): subjects are the higher-level 
units and time points the lower-level units. It 
should be noted that application of this very 
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interesting model requires that there be at least 
two response variables (J > 2). 

In mixture growth models one will typically 
pay a lot of attention to the modeling of the time 
dependence of the state occupied at the dif- 
ferent time points. The latent class or mixture 
approach allows identifying subgroups (cate- 
gories of the time-constant latent variable w) 
with different change patterns (Nagin, 1999). 
The extension provided by the mixture latent 
growth model is that the dynamic dependent 
variable is itself a (discrete) latent variable 
which is measured by multiple indicators. 


3.6 Mixture growth model 


The mixture or latent class growth model 
(Nagin, 1999; Muthén, 2004; Vermunt, 2006) 
can be seen as a restricted variant of the mixture 
latent growth model; i.e., as a model for a sin- 
gle indicator measured without error. The extra 
constraint is the same as the one used in the 
mixture Markov model: K = M and P(y;;|x,) =1 
if x, = y, and 0 otherwise. 

A more natural way to define the mixture 
growth model is by omitting the time-varying 
latent variable x, from the model specification, 
as we did for the mixture Markov model. This 
yields 


L T 
P(y;|Z;) = >, P(w|z;) I] P(Vjt |W Zit) 


w=1 t=0 


Note that this model is equivalent to a standard 
latent class model for T+1 response variables 
and with predictors affecting these responses. 


3.7. Standard latent class model 


When we eliminate both w and the transition 
structure, we obtain a latent class model that 
assumes observations are independent across 
occasions. This is a realistic model only for the 
analysis of data from repeated cross-sections; 
i.e., to deal with the situation in which observa- 
tions from different occasions are independent 
because each subject provides information for 


only one time point. One possible way to define 
this model is 


K J 
PCY; |Zit,) = P(x zi,) T] POs. Zit,) 


x=1 jJ=1 


where ft; is used to denote the time point for 
which subject i provides information. This is a 
standard latent class model with covariates. 


4 Application to NYS data 


To illustrate the latent class models described 
above we use data from the nine-wave National 
Youth Survey (Elliott, Huizinga and Menard, 
1989) for which data were collected annually 
from 1976 to 1980 and at three-year intervals 
after 1980. At the first measurement occasion, 
the ages of the 1725 children varied between 
11 and 17. To account for the unequal spacing 
across panel waves and to use age as the time 
scale, we define a model for 23 time points (T+ 
1 = 23), where t= 0 corresponds to age 11 and 
the last time point to age 33. For each subject, 
we have observed data for at most 9 time points 
(the average is 7.93) which means that the other 
time points are treated as missing values. 

We study the change in a dichotomous 
response variable “drugs” indicating whether 
young persons used hard drugs during the past 
year (1=no; 2=yes). It should be noted that 
among the 11-year-olds in the sample nobody 
reported to have used hard drugs, which is 
something that needs to be taken into account in 
our model specification. Time-varying predic- 
tors are age and age squared, and time-constant 
predictors are gender and ethnicity. 

A preliminary analysis showed that there is 
a clear age-dependence in the reported hard- 
drugs use which can well be described by a 
quadratic function: usage first increases with 
age and subsequently decreases. That is why 
we used this type of time dependence in all 
reported models. To give an idea how the time 
dependence enters in the models, the specific 
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regression model for the latent transition prob- 
abilities in the estimated Markov models was: 


P(x, = k'|x,_1 =k, w, age,,) 


P(x, = k|x,_1 = k, w, age;,) = Bote 


+ Baye Ayar + Bork Age + Bavx: (ABC y)” 


where the 6 coefficients are fixed to 0 for k’ = k. 
The variable d,_, is a dummy variable for 
the second mixture component. For the initial- 
state, we do not have a model with free parame- 
ters but we simply assume that all children start 
in the no-drugs state at age 11. 

In the mixture growth models, we use the fol- 
lowing binary logistic regression model for y;,: 


P(Yit = 2|w, age,,) 
P(Yix = 1|w, age;,) 
= Bow + Bry ABC + Baw ° (age;,)° 


where we fix B,, = —100 and B,, = B,, =0 to 
obtain a model in which w = 1 represents a 
non-user Class, a class with a zero probability 
of using drugs at all time points. 

Table 23.2 reports the fit measures for the 
estimated models, where the first set of models 


do not contain time-constant covariates gen- 
der and ethnicity. As can be seen from log- 
likelihood and BIC values, the various types of 
Markov models perform much better than the 
mixture growth models, which indicates that 
there is a clear autocorrelation structure that 
is difficult to capture using a growth model. 
Even with 7 latent classes one does not obtain 
a fit that is as good as the Markov-type models. 
Among the Markov models, the most general 
model — the mixture latent Markov model — 
performs best. By removing measurement error, 
simplifying the mixture into a mover-stayer 
structure, and/or eliminating the mixture struc- 
ture, the fit deteriorates significantly. The last 
two models are mixture latent Markov models 
in which we introduced covariates in the model 
for the mixture proportions. Both sex and eth- 
nicity seem to be significantly related to the 
mixture component someone belongs to. 

The parameters of the final model consist of 
the logit coefficients of the model for w, the logit 
coefficients in the model for the latent transi- 
tion probabilities, and the probabilities of the 
measurement model. The latter show that the 
two latent states are rather strongly connected 


Table 23.2 Fit measures for the estimated models with the nine-wave National Youth Survey data set 


Model Log-likelihood BIC # Parameters 
A. Independence —5089 10200 3 
B. Markov —4143 8330 6 
C. Mixture Markov with L= 2 —4020 8108 9 
D. Mover-stayer Markov —4056 8165 7 
E. Latent Markov with K = 2 —4009 8078 8 
F. Mixture latent Markov with L= 2 and K= 2 —3992 8066 11 
G. Mover-stayer latent Markov with K = 2 —4000 8068 9 
H1. Mixture growth with L= 2 (w =1 non-users) —4381 8792 4 
H2. Mixture growth with L= 3 (w =1 non-users) —4199 8457 8 
H3. Mixture growth with L= 4 (w =1 non-users) —4113 8315 12 
H4. Mixture growth with L=5 (w =1 non-users) —4077 8273 16 
H5. Mixture growth with L= 6 (w = 1 non-users) —4037 8223 20 
H6. Mixture growth with L= 7 (w =1 non-users) —4024 8227 24 
I. F + Gender effect on W —3992 8066 12 
J. F + Gender and Ethnicity effect on W —3975 8061 15 
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to the two observed states: P(y, = 1|x, = 1) = 
0.99 and P(y;, = 2|x, = 2) = 0.87. 

The most relevant coefficients in the model 
for the transition probabilities are the param- 
eters for w. These show that class 2 is the 
low-risk class, having a much lower proba- 
bility than class 1 of entering into the use 
state (G6 = —2.37;S.E = 0.26) and a much higher 
probability of leaving the non-use state (8 = 
3.72;S.E. = 0.68). Combining these estimates 
with the quadratic time dependence of the tran- 
sitions yields a probability of moving from the 
non-use to the use state equal to 2.8% at age 12, 
23.4% at age 21, and 0.6% at age 33 for w=1, 
and equal to 0.3% at age 12, 2.8% at age 21, 
and 0.1% at age 33 for w = 2. The probability 
of a transition from the use to the non-use state 
equals 0.1% at age 12, 20.5% at age 26, and 
6.2% for w=1, and 4.1% at age 12, 91.4% at 
age 26, and 73.1% at age 33 for w= 2. 

The parameters in the logistic regression 
model for w shows that males are less likely 
to be in the low-risk class than females (y = 
—0.58;S.E. = 0.14) and that blacks are more 
likely to be in the low-risk class than whites 
(y = 0.79; S.E = 0.22). Hispanics are less likely 
(y = —0.46; S.E. = 0.33) and other ethnic groups 
more likely (y = 0.25;S.E = 0.52) to be in 
class 2 than whites, but these effects are 
non-significant. 


5 Discussion 


We presented a general framework for the anal- 
ysis of discrete-time longitudinal data and illus- 
trated it with an empirical example in which 
the Markov-like models turned out to perform 
better than the growth models. 

The approach presented here can _ be 
expanded in various ways. First, while we 
focused on models for categorical response vari- 
ables, it is straightforward to apply most of 
these models to variables of other scale types, 
such as continuous dependent variables or 
counts. Other extensions include the definition 
of multiple processes with multiple x, or of 


higher-order Markov processes. Models that are 
getting increased attention are those that com- 
bine discrete and continuous latent variables. 
Finally, the approach can be expanded to deal 
with multilevel longitudinal data, as well as 
with data obtained from complex survey sam- 
ples. Each of these extensions is implemented 
in the Latent GOLD software that we used for 
parameter estimation. 


Appendix A: Baum-Welch algorithm 
for the mixture latent Markov model 


Maximum likelihood (ML) estimation of the 
parameters of the mixture latent Markov 
model involves maximizing the log-likelihood 
function: 


N 
L= i log P(y;|z;) 


i=l 
a problem that can be solved by means of 
the EM algorithm (Dempster, Laird and Rubin, 
1977). In the E step, we compute 


POW. Xp.Xys- + sXpl¥ jo) 


_ P(w, Xo, X45- oe Xp, Yi|Z;) 


P(y;|z;) 


which is the joint conditional distribution 
of the T+ 2 latent variables given the data 
and the model parameters. In the M step, 
one updates the model parameters using stan- 
dard ML methods for logistic regression anal- 
ysis and using an expanded data matrix with 
P(w,X),X1;--.,Xrly;,Z;) as weights. 

It should be noted that in a standard EM 
algorithm, at each iteration, one needs to 
compute and store the L-K‘*t! entries of 
P(w,X),X,,---,Xrly;,Z;) for each subject or, 
with grouped data, for each unique data pat- 
tern. This implies that computation time and 
computer storage increases exponentially with 
the number of time points, which makes 
this algorithm impractical or even impossi- 
ble to apply with more than a few time 
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points (Vermunt, Langeheine and Béckenholt, 
1999). However, because of the collapsibil- 
ity of the mixture latent Markov model, it 
turns out that in the M step of the EM algo- 
rithm one needs only the marginal distributions 
P(wly;,z;), P(w, x;|y;,Z;), and P(w, X_1,X;l¥i,Z;). 
The Baum-Welch or forward-backward algo- 
rithm obtains these quantities directly rather 
than first computing P(w,x,,X,,...,Xrly;,Z;) 
and subsequently collapsing over the remain- 
ing dimensions as would be done in a standard 
EM algorithm (Baum et al., 1970; McDonald 
and Zucchini, 1997). This yields an algorithm 
that makes the mixture latent Markov model 
applicable with any number of time points. 
Whereas the original forward-backward algo- 
rithm was for latent (hidden) Markov models 
without covariates and a single response vari- 
able, here we provide a generalization to the 
more general case with a mixture w, covariates 
z;, and multiple responses. 

The two key components of the Baum-Welch 
algorithm are the forward probabilities @,,,,, and 
the backward probabilities 6;,,,,. Because of our 
generalization to the mixture case, we need an 
additional quantity y,,,. These three quantities 
are defined as follows: 


Qiwx, = P(X, Yio- ++ Vil, Z;), 
Pie, = PUVit41) ++ Vir |X, W,Z;), 
Yiw = P(w,y;|Z;). 


Using Qiwsx,, Biwx,,» and Yj, one can obtain the 
relevant marginal posteriors as follows: 


Yiw 
P(w p44) = , (4) 
(W192) Bey ta) 
Qin Pine 
P(w, X;lY;,Z;) rc te (5) 


PCW, X14) X¢_4, W|Yi, Z;) 


-_ Vi Osi ge (X;| X14, Ws Zit) PV it |X, W, Zit) Pie 
P(y;|Z;) 


(6) 


where Ply;|z;) = ear ne Yiws and 
P(x;,|X,_,,W,Z;) and P(y;,|x,,w,z,;,) are model 
probabilities. 


The key element of the forward-backward 
algorithm is that T+1 sets of a;,,, and Bix, 
terms are computed using recursive schemes. 
The forward recursion scheme for @;,,, is: 


a =P(X |W, Zig) P(Vin|Xo, Ws Zio): 


IWXq 


K 
Qiwx, = S ae (X|X1 WwW, ao} 
X1=1 
x PCY; |X. Ws Zi) 


for f=1 up to t= T. The backward recursion 
scheme for Biy,, is: 


Pie: =1, 
K 
Pie, = = Biwx;.P (Xi lXp Ws Zit) 
Xj1=1 


X PUY ing 1 |Xtp1> Ws Zit) 


for T—1 down to t=0. The quantity y;, is 
obtained as: 


K 
Yiw = 2 P(w|z;) Qiwx, Pins 


X,;=1 


for any t. So, first we obtain a;,,,and B;,., for 
each time point and subsequently we obtain 
Yiw. Next, we compute P(wly,,z;), P(w, X;|Y;,Z;); 
and P(w,x,_,,X;ly;,Z;) using equations (4), (5), 
and (6). In the M step, these quantities are used 
to obtain new estimates for the mixture latent 
Markov model probabilities appearing in equa- 
tions (2) and (3) using standard methods for 
logistic regression analysis. 

The only change required in the above for- 
mulas when there is missing data is that 
P(Y 1X1, W, Zi) is replaced by P(yjy|xX;, Ww, Zi)?" in 
each of the above equations, where 6,, = 1 if y;, 
is observed and 0 if y, is missing. This implies 
that P(y|x;, w,Z;) is “skipped” when y;, is miss- 
ing. In the M step, cases with missing responses 
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at occasion t do not contribute to the estimation 
of the response probabilities for that occasion, 
but they do contribute to the estimation of the 
other model probabilities. 


Appendix B: Examples of Latent 
GOLD syntax files 


The Latent GOLD 5.0 software package 
(Vermunt and Magidson, 2005) implements the 
framework described in this article. In this 
appendix, we provide examples of input files 
for estimation of mixture latent Markov models, 
mixture Markov, latent Markov, and mixture 
growth models. 

The data should be in the format of a person- 
period file, where for the Markov-type mod- 
els periods with missing values should also be 
included in the file since each next record for 
the same subject is assumed to be the next time 
point. The definition of a model contains three 
main sections: “options”, “variables” and 
“equations”. 

An example of the most extended model, 
the mixture latent Markov model is the 
following: 


options 
missing=all; 
coding=first; 
variables 
caseid id; 
dependent drugs nominal; 
independent gender nominal, 
ethnicity nominal, age numeric, 
age2 numeric; 
latent 
W nominal 2; 
X nominal markov 2; 
equations 
W<-1+ gender + ethnicity; 
X[=0] <- (-100) 1; 
X <- (a~) 1 | X[-1] + (b~) W| X[-1]+ 
(c~) age | X[-1] + (d~) age2|X[-1]; 
drugs <- (e~) 1 | xX; 


In the options section, only the two com- 
mands for which we changed the default set- 
ting is shown. The statement “missing=all” 
indicates that all records with missing values 
should be retained in the analysis. The option 
“coding=first” requests dummy coding for 
the nominal variables using the first category as 
the reference category. 

In the variables section we define the 
caseid variable connecting the multiple records 
of one person, the latent, dependent (or 
response) and independent variables to be used 
in the analysis, as well as various attributes of 
these variables, such as their scale type and, for 
categorical latent variables, their number of cat- 
egories and whether they vary over time (indi- 
cated with the statement markov). 

The equation section contains four equa- 
tions: one for the mixture variable (W), one for 
the initial state (X[=0]), one for the state at 
time point t (X) conditional on the state at t— 
1 (X[-1]), and one for the response variable. 
With more response variables, one would have 
a separate equation for each response variable. 
The logit model for W contains an intercept (the 
term “1”) and effects of gender and ethnicity. 
The parameter labels, a, b, c, d, and e are given 
in parentheses. The model for X[=0] contains 
an intercept that is fixed to -100, which means 
that everyone starts in latent state 1. The model 
for X is parameterized in such a way that the 
intercept and the effects of W, age, and age2 can 
be interpreted as effects on the logit of a tran- 
sition (as in the equation provided in the text). 
This is achieved by the conditioning “| X[-1]” 
combined with the tilde “~” in the parameter 
label, which yields a special coding of logit 
coefficients in which the no-change category 
serves as the reference category. The model for 
the response variable drugs contains an inter- 
cept which varies across latent states, with the 
same type of coding as used for the transition. 

A mixture Markov is obtained with the extra 
line “e = -100;”. This fixes the logit param- 
eters in the model for the response variable 


384 Handbook of LongitudindP Regaa RY: https:/jafrilibrary.com 


to —100, which because of the special cod- 
ing induced by the tilde in the parameter 
label yields a perfect relationship between 
X and drugs. The 2-class mixture can be 
changed into a mover-stayer structure with 
the additional line “b = -100;” which fixes 
the transition probabilities to 0 for the second 
class. This restriction can be used in the mix- 
ture Markov and in the mixture latent Markov 
model. A latent Markov model is obtained 
either by removing W from the variables and 
equations sections or by setting its number of 
categories to 1. 

A mixture growth model is obtained by 
removing X from the variables section and 
replacing the equations section with the fol- 
lowing: 


equations 
W<- 1+ gender + ethnicity; 
drugs <- (a~) 1 | W+(b~) age | W+ 
(c~) age2 | W; 


a[1] =-100; 
b[1] =0; 
c[1] =0; 


The constraint on the intercept indicates that 
the first mixture component does not use drugs 
with probability 1. The other two constraints 
fix the redundant age and age2 effects for class 
one equal to 0. 
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| Chapter 24 i 


Nonparametric methods for event 
history data: descriptive measures 
C. M. Suchindran 


1 Introduction 


An individual’s life history is often character- 
ized by a history of transitions to a number of 
life events. Occurrence of a specific event is 
considered as a transition to a particular state. 
These states can include both transient (states 
from which transitions are possible) and absorb- 
ing states (states from which escape is not pos- 
sible). Suppose that each subject begins life in 
one of several states (usually in a transient state) 
and that at each point in time will either be 
in the same state or will make a transition to 
one of the possible states. Event history data for 
a subject can be described by a set of events 
experienced by the subject and the timing of 
these transitions. Thus event history data usu- 
ally consists of timing of occurrence of multiple 
events. The data can have additional complex- 
ities with some durations in some states being 
censored. Also, it is possible that studies by 
design observe only partial history for each sub- 
ject (we will describe below a number of such 
designs). 

To illustrate the data and describe notations 
we will look at the following example describ- 
ing a woman’s marital history. A married 
woman can divorce, be widowed or die. There- 
fore, the life history will consist of three tran- 
sient states (married, divorced, and widowed) 


and an absorbing state (death). The data will 
record age at which the subject experiences a 
specific event along with the type of event. 
Without any loss of generality, denote states 1, 
2, S, as transient states and states S, +1... Sas 
absorbing. In the example above, denote state 
1 as married, state 2 as widowed, state 3 as 
divorced, and state 4 as death. In this case S, =3 
and S= 4. Let z, denote the state corresponding 
to a subjects nth transition and let z, denote the 
initial state. In the example above all women 
start the observation at their first marriage and 
therefore z, = 1 (married). If T,, represents the 
time at which transitions occur, for a subject i 
the event history data vector will be of the form: 


Hy = {Z9,T1,21,--+5TmsZm}, where 1 < z; < S, 


for j< mand S, <z,,<S 


Suppose a married woman ends her marriage 
by divorce at duration T, (time measured from 
first marriage) and dies at duration T,, her data 
vector will be of the form: 


A, = {% = 1,2, = 3,T,,2,=4,T,} 


Note that, in this case, the event history ends 
when an absorbing state (death) is entered. In 
most observational settings, some event histo- 
ries will be incomplete due to right censor- 
ing because the subject has not reached an 
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absorbing state by the time the subject was last 
observed. In this case the notation (1) can still 
be used by taking z,, = S+1 to denote the time 
T,, as the censored time. 

Most often studies by design collect only a 
partial event history. The usual single-event 
survival data consist of two states (one transient 
and one absorbing state) and time to transition 
to an absorbing state is noted. Usually right cen- 
soring occurs in the data. In the marital history 
example, suppose an investigator is interested 
in time to marital disruption of first marriage 
and treats widowhood and divorce together as 
an absorbing event signifying marital disrup- 
tion. For simplicity in this situation we will 
assume that death of the subject or the end of 
observation period without the occurrence of 
marital disruption or death will be treated as a 
censored observation. For presentation in this 
paper, such data will be called single spell data 
with censoring. Unlike single spell data with 
one absorbing state, the data can have one tran- 
sient state and more than one absorbing state. 
For example, in the study of marital history, 
the investigator may want to examine whether 
or not the marriage ended by widowhood or 
divorce. Once again the data can be censored 
by death of the subject or by the end of observa- 
tion while still married. We will call this data, 
single spell data with competing risks. When 
the data involves more than one transient state 
and absorbing state, we will label it as multi- 
state data. When a subject experiences an event 
of the same type (transition among two tran- 
sient states) repeatedly over time, we call the 
recorded data recurrent event data. 

In several practical situations, data observa- 
tion will be further limited. For example, in a 
single spell examination, it is possible that the 
collected information consists of an observation 
time and the knowledge that the event of inter- 
est occurred before the observation time. In this 
situation, the exact time occurrence of the event 
(transition) will not be known. A similar situ- 
ation can occur in the collection of recurrent 


event data where we know the observation time 
and the number events that occurred before 
the observation time. In this case the subject 
is examined at only one point in time and the 
exact times of transitions are unknown. Such 
data are called current status data. Occasionally 
in recurrent event situations, the data consists 
of the observation time and the time elapsed 
between the last occurrence of the event and 
the observation time. Such data are referred as 
backward recurrence time data. 

In this paper we will describe various non- 
parametric methods to analyze event history 
data. The method will depend on the type of 
available data as described earlier. In Section 2 
analysis of single spell data with censoring will 
be introduced. The analysis will be extended 
to include competing risks in Section 3. The 
multistate data description is introduced next. 
Although recurrent event data can be consid- 
ered as a special case of multistate data, meth- 
ods specific to the analysis of recurrent events 
appear in the literature. We will review them 
in Section 5. Following discussion of recurrent 
event data, the analysis of current status and 
backward recurrence data will be introduced in 
Section 6. The emphasis in this paper is on non- 
parametric measures with the goal to describe 
the data. Therefore measures based on paramet- 
ric models will not be discussed, and the paper 
will not discuss introduction of covariates in 
the data. These issues will be addressed in sub- 
sequent chapters. 


2 Single spell data with censoring 


As mentioned earlier single spell data exam- 
ines the transition from a transient state (e.g., 
alive) to an absorbing state (death). The data 
consists of time to a specified event from the 
beginning of exposure period. If at the time 
of observation the event has not occurred, the 
information for this case is considered as right 
censored and the time is measured as the time 
elapsed from the time of exposure to the time 
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of observation. The main summary measure of 
this data is a nonparametric estimate of the sur- 
vival function. Kaplan and Meier (1958) pro- 
vide an estimate of the survival function when 
censoring is present in the data. Let f(,) < ti) < 

. < tj denote the distinct ordered times of 
event (not counting censoring times). Let d; be 
the number of events at ¢), and let n; be the 
number not experiencing the event just before 
t) (called number exposed to risk at time f,)). 
Then the Kaplan-Meier estimator of the survival 
function is 


Si) = T] (-2) 4) 


itty <t 1 


This estimator is also known as the _ prod- 
uct limit estimator. The variance of the esti- 
mated survival function is obtained by using 
the Greenwood formula 


Var(S(t;)) = [S(t)) y (2) 
j=1 


a 
= n(n; = d;) 


The data can be summarized by several other 
important functions of the survival function. 
The main summarizing function is the hazard 
function h(t) which expresses the relative rate 
of change in the survival function. Thus the 
hazards function is 


A() = -——— (3) 


t 
The function A(t) = f h(r)dr is called the cumu- 


0 
lative hazard function which has the relation to 
the survival function 


A(t) = — log S(t) 


Thus a cumulative hazard function can be esti- 
mated by simply computing the negative of the 
log of the estimate of the survival function. It 
has an interpretation as the expected number of 
events in (0, t] per unit at risk of experiencing 


the event. An alternative approach to esti- 
mate the cumulative hazard function directly 
using the Nelson-Aalen estimator 


7 id 
A(ty) =o (4) 
—~ n, 
j=l] 
This estimator is sometimes preferred because 
of its strong theoretical justification. Breslow 
(1972) suggested estimating the survival func- 
tion as 


S(t) = exp{—A()} (5) 


The proportion of individuals experiencing the 
event or the cumulative probability of experi- 
encing the event by time t, denoted as F'(£) is 
calculated as 


Fi) =1-S() 


Quantile measures derived from the survival 
function are also used to summarize the data. 
For example, median time to event (t,,) is cal- 
culated from the relation 


Sta) = 5 


2.1 Example 


The data for this illustration is taken from a 
demographic survey. The event of interest is 
the occurrence of the fourth birth. The expo- 
sure period starts at the time of the third birth. 
For those who had a fourth birth, time is cal- 
culated as the time elapsed between the third 
and fourth birth. Those who did not experience 
a fourth birth at the time of the survey are con- 
sidered as censored observations. For them the 
time is calculated as the time elapsed between 
their third birth and the survey date. There were 
303 women in the data set. Table 24.1 gives a 
tabulation of the data. 

The survival functions and the correspond- 
ing standard errors are calculated using 
equations (1)-(5). The survival functions cal- 
culated by the Kaplan-Meier and Nelson-Aalen 
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Table 24.1 Time to fourth order birth 


Time Numberin  Numberof Kaplan Standard  WNelson-Aalen  WNelson-Aalen Standard 
in the risk events d; Meier error cumulative survival | error of 
years set n, S(t) (S(t) hazard A(t) function S(t) x t) 
1 303 22 0.9274 0.0149 0.0726 0.9300 0.0144 
2 249 56 0.7188 0.0271 0.2975 0.7427 0.0251 
3 169 40 0.5487 0.0313 0.5342 0.5861 0.0295 
4 116 18 0.4635 0.0323 0.6894 0.5019 0.0313 
5 80 4 0.4404 0.0327 0.7394 0.4774 0.0320 
6 69 3 0.4212 0.0331 0.7828 0.4571 0.0328 
7 59 8 0.3641 0.0342 0.9184 0.3992 0.0344 
8 47 4 0.3331 0.0346 1.0035 0.3666 0.0353 
9 37 3 0.3061 0.0351 1.0846 0.3380 0.0361 
10 27 4 0.2608 0.0365 1.2328 0.2915 0.0379 
11 19 2 0.2333 0.0375 1.3380 0.2624 0.0393 
12 14 3 0.1833 0.0390 1.5523 0.2118 0.0411 
13 6 1 0.1528 0.0428 1.7190 0.1792 0.0459 


methods are usually quite close. This is partic- 
ularly true when the number of events is small 
relative to the number in the risk set. The table 
shows that nearly 15% of the women with a 
third birth did not have a fourth birth (Kaplan- 
Meier estimate) in 13 years. The median time 
to fourth birth is 3.57 years. 

So far we have described a method to ana- 
lyze right censored single spell data collected in 
single years. Sometimes the data record event 
times in intervals. Assuming censoring occurs 
uniformly in the interval, the exposure time for 
those who are censored in the interval is given 
as half the length of the interval. When left 
censoring is present in the data minor adjust- 
ments can also be made to the calculations of 
the risk set in calculating the survival function 
(Guo, 1993). 


3 Single spell data with competing 
risks 


Earlier we examined methods to analyze sin- 
gle spell data with no competing event present. 
For example, in the example above we have 
examined the occurrence of fourth birth in the 


absence of marital disruption or death of the 
individual. Assume that there are k competing 
events in the population. Chiang (1968) intro- 
duces the following quantities. Suppose events 
occur at distinct ordered times t,,,) = 0 < ty) < 

. <t,). The data will note the time and the 
type of event. Define Q,; as the crude prob- 
ability of occurrence of an event of type Rs, 
6=1,2,...k at time t in me presence of other 


competing risks. Note that DD Qis = q,, the prob- 


ability of occurrence of oi ‘event, regardless of 
type at time t. The goal of the analysis is to com- 
pute the probability that an individual expe- 
riences a specific event by a given duration t. 
Using the product limit estimator we calculate 
the cumulative probability of not experiencing 
any event by time t as 


t-1 
S)= Il [1—q:] 
i=0 
The crude probability Q,; is calculated as 


d; 
Qis = (6) 
n 


i 
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where d,; is the number of events of type R; 
at time ¢; and n; is the number in the risk set 
at that time. The cumulative probability that 
an individual experiences a specific event Rs, 
denoted as CQ,;, is calculated as 


CQ = FS) Qp (7) 


i=0 


Variance of the estimated cumulative probabil- 
ity is calculated as follows: 


Denote p; = 1-—4q;. Variance(p;) = p;(1 — p)/n;, 
where n; is the number at risk at time t,. 

Also Variance(Qis) = Qis(1 — Qis)/n; and 
Covariance(p;Qis) = —P;Qis/Nj- 

Let V, be the variance of the cumulative prob- 
ability CQ,;. This variance can be computed 
using the following formula 


t-1 t 
AS Ai Var(p;)+>~ B Var(Q)s) 
j=0 


j=0 
t A 
+ )0 Aj, Bjcov(p; Qs) (8) 
j=0 


where A; follows the recurrence relation A; = 


Ay-1 t+ S()Qs5, B= S(t) (By =1) and Ay = 


¥ S(t) Qe. 
isj 


3.1 Example 


A demographic survey collected marital history 
from 17,045 women. The goal is to examine the 
disruption of the first marriage. The two com- 
peting causes of marital disruption are divorce 
and widowhood. Women who are in an intact 
marriage at the time of survey are considered as 
censored observations. For them time is calcu- 
lated as the time from first marriage to the sur- 
vey. Otherwise the duration of marriage at the 
time of disruption is noted. Table 24.2 shows a 
partial tabulation of data. 

Equations (6) and (7) are used to obtain crude 
probabilities of divorce and widowhood and are 
presented in Table 24.3. The table shows that 
the probability of not experiencing marital dis- 
ruption is 0.90360 before 7 years and the prob- 
ability that the marriage will end in 7 years is 
1—0.90360 = 0.09640. The table also shows that 
the probability that the marriage will end due 
to death of spouse in seven years is 0.00315 and 
due to divorce is 0.09325. The standard errors 
calculated using equation (8) are included in 
Table 24.3. 


4 Multistate data 
So far we have examined event history data that 


involves one transient state and one or more 
absorbing states. In this section we will examine 


Table 24.2 Marital disruptions by divorce and widowhood 


Duration of marriage Risk set Number Number Number of marital Number still 

in completed years divorced widowed disruptions married at survey 
0 17045 140 1 141 88 

1 16816 211 3 214 222 

2 16380 272 2 274 523 

3 15583 256 8 264 405 

4 14914 232 12 244 452 

5 14218 193 10 203 555 

6 13460 193 13 206 539 

Z 12715 174 12 186 543 
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Table 24.3 Cumulative probabilities and standard errors 


Duration of Fx S(x) hen Qya Cumulative Cumulative Standard Standard 
marriage in probability of — probability error error 
completed widowhood of divorce CO COle 
years CD rsatave CQaivorce 
0 0.00827 1.00000 0.00006 0.00821 0.00006 0.00821 0.000059 0.00069 
1 0.01273 0.99173 0.00018 0.01255 0.00024 0.02065 0.000118 0.00109 
2 0.01673 0.97911 0.00012 0.01661 0.00036 0.03691 0.000145 0.00145 
3 0.01694 0.96273 0.00051 0.01643 0.00085 0.05273 0.000227 0.00173 
4 0.01636 0.94642 0.00080 0.01556 0.00160 0.06745 0.000315 0.00196 
5 0.01428 0.93093 0.00070 0.01357 0.00226 0.08009 0.000377 0.00213 
6 0.01530 0.91764 0.00097 0.01434 0.00315 0.09325 0.000450 0.00231 
7 0.90360 

processes that can take multiple states that ki qj(ssstu) _,, 

include more than one transient state. Detailed ry(s) = it ‘— ee FJ (10) 

description of multistate data analysis can be 1 

. eet “ d =i [qu (s,s + u) — ] 
found in Namboodiri and Suchindran (1987). and r,(s)= lim —~—— ——_ 


Suppose there are a finite number of states with 
two or more transient states that individuals 
can move in and out at various time points. The 
event history data, possibly censored, records 
the time at which various transitions occur. For 
example, migration history data records an indi- 
vidual’s movements in life until death or until 
a point of observation at which the history is 
censored. Let X(t) denote the state occupied by 
an individual at time t and there are K states of 
which K, states are transient and K, states are 
absorbing (K, + K, = K). Under Markov assump- 
tions, the process is governed by a set of tran- 
sition probabilities that the state occupied at 
time t is j given that the state occupied at time 
s (0 <s < tf) isi, denoted as: 


qi(s, t) = P[X(t) = j|X(s) = 7] (9) 


K 

Note that }° q,(s,f)=1 and if i is an absorb- 
j=t 

ing state q,(s,t)=1 for j =i and zero other- 

wise. Because X (t) changes continuously with 

time, the process is also sometimes described 

in terms of transition intensity defined as: 


K 
so that )°r;(s)=0 
j=l 


The transition probabilities are usually put in a 
matrix Q(s, ft) with its (ij)"" element being qi(s, t) 
and similarly form a matrix of transition inten- 
sities R(s). Several summary measures of the 
process can be computed. 

State occupancy probabilities provide the 
probability of being in a particular state after a 
longer period given the initial state. Formally, it 
is the probability that an individual is in state j 
at time t, given that an individual occupies state 
i at time s. Assuming a partitioning of the inter- 
val (s, t)ass=t,<t,<.....<t,=t, then the 
state occupancy probabilities are the elements 
of the matrix 


Qs, 1) = T] Atta) (11) 
h=0 


A second useful measure is the expected length 
of time spent in a specified state. Denote 
e,;(s,t) as the time spent in state j during 
the interval (s, t) for an individual in state i at 
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time s. Form the corresponding matrix E(s, f) = 
{e,(s, 0}. Then 


E(s,t) = / Q(s,7)dr (12) 


A third useful measure is the expected number 
of visits to specific states in a specified time 
interval. Let m,(s, £) denote the number of visits 
to state j in the interval (s, t) for an individual in 
state i at time s. Form the corresponding matrix 
M(s, t). Then 


Ms, t) = / Q(s, u)B(u)du (13) 


where B(u) is obtained from R(u) by replacing 
its diagonal elements with zero. Event history 
data allows one to compute transition probabil- 
ities and the corresponding summary measures. 
For convenience tabulate data in small inter- 
vals (t,t+h). Let d;(t,t+ h) denote the number 
of transitions from state i to state j in the inter- 
val. Also denote n,(t) as the number of indi- 
viduals in state i at time t and c,(t,t+h) the 
number of individuals in state i at time t who 
were censored in the interval (t, t+ h). Then the 
transition probability can be estimated as: 


d,(t,t+ h) 


qy(tt+h) = (14) 


ny gcilt t+ h) 


Use these estimated transition probabilities to 
from the matrix Q(t,t+h). The state occu- 
pancy probabilities at time t are calculated as 


n—-1 
I] Qth thea): 
h=0 


4.1 Example 


The following data on contraceptive use was 
created to illustrate the method. In this setup 
the multistate process has three states: use, non- 
use and pregnancy. The use and non-use states 


are considered as transient because women can 
move back and forth from these states. For 
illustration, we treat pregnancy in this exam- 
ple state as an absorbing state. The dataset con- 
tained contraceptive history of 1835 women 
among whom 1765 accepted contraceptive at 
the time of recruitment. The remaining 70 
women started in non-use state. The following 
table shows the transition in the first month: 


Destination state 


Origin Non- 

state Use use Pregnant Censored 
Use 1721 23 18 3 
Non-use 27 = 339 3 1 


Equation (14) was used to calculate the transi- 
tion probability matrix for month 1: Q(0,1) = 


Destination state 


Origin state Use Non-use Pregnant 
Use 0.9881 0.0102 0.0017 
Non-use 0.3885 0.5683 0.0432 
Pregnant 0 0 1 


Transition data during month 2 (labeled as 1—2) 
were tabulated as: 


Destination state 


Origin Non- 

state Use use Pregnant Censored 
Use 1703 27 18 0 
Non-use 13 45 3 1 


The corresponding transition matrix is calcu- 
lated as: Q(1,2) = 


Destination state 


Origin state Use Non-use Pregnant 
Use 0.9742 0.0154 0.0103 
Non-use 0.2113 0.7398 0.0487 


Pregnant 0 0 1 
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The contraceptive status at the end of month 
2 is calculated using equation (11) as Q(0,2) = 
Q(0,1) * Q(1, 2). The resulting matrix is: 


Destination state 
(at the end of month 2) 


Origin state Use Non-use Pregnant 
(at time zero) 

Use 0.9648 0.0228 0.0124 
Non-use 0.4986 0.4265 0.0749 
Pregnant 0 0 1 


The results show that the probability that a non- 
user at the beginning of the study will be in the 
use status at the end of month two is 0.4968 and in 
a pregnant state is 0.0749. Repeating the calcula- 
tions to the end of six months (data not shown): 


Destination state 
(at the end of month 6) 


Origin state Use Non-use Pregnant 
(at time zero) 

Use 0.8606 0.0789 0.0605 
Non-use 0.6012 0.2369 0.1619 
Pregnant 0 0 1 


Duration of contraceptive use for those who are 
in use status or non-use status can be calculated 
using equation 12. When data are tabulated in 
one unit (one month in the example here) inter- 
vals the integral can be approximated by sum as 


n-1 
E(0,t) = © Q(t. tix1). A simple linear approxi- 


mation can be to set the interval length to one if 
the interval length is more than one unit. Calcu- 
lations based on the data show that the average 
length of stay in various contraceptive states at 
the end of six months is as follows: 


Average duration (in months) 


in state 
Use Non-use 
Use 5.52 0.30 
Initial state 
Non-use' 3.14 2.21 


The data shows that a woman in the non-use 
state at the beginning of the study will spend 
only 2.21 months in the non-use state during 
the first six months of observation. 

A third summary measure is to examine the 
average number of visits to each state during a 
fixed time period of observation for a woman 
starting in a specific state. For this purpose 
we will use equation (13). In order to calculate 
the B matrix in equation (13) we will use the 
following relationship between the transition 
matrix (Q) and the transition intensity matrix 
R. Let Q,,(s,t) denote the submatrix indicating 
transitions among the absorbing state with cor- 
responding transition intensity matrix for the 
interval R,,(u) for u in the interval (s, t). Then 


Ri =— 10 - Quis.s+h)) 


+ -— Qui (s,s+h))?/2 
+I —Qi(s,s+h))?/3+....] (15) 


In the example above: Q,,(0,1) = 


Destination state 


Use Non-use 
Use 0.9881 0.0102 
Origin state 
Non-use 0.3884 0.5683 


Using equation (15) the resulting R,, matrix for 
the interval is: 


Destination state 


Use Non-use 
Use —0.01470 0.0134 

Origin state 
Non-use 0.5083 —0.5640 


The B matrix is obtained by setting the diagonal 
elements of R,, matrix as zero. 

In the example here the integral in equation 
13 is approximated as: 


M(0,6) = » CQ11(0, t;) B(E) 
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The results show the average number of visits 
specific for each initial state as follows: 


M,, (0,6) = 


Destination state 


Use Non-use 
Use 0.0161 0.0814 
Origin state 
Non-use 0.7430 0.0362 


The result can be interpreted as follows. A 
cohort of 1000 women who at the beginning of 
observation are in the use state will, on an aver- 
age, make 81 transitions to the non-use state 
and 16 revisits to the user state in six months. 
Similarly, 1000 women starting in the non-use 
state will, on an average, make 743 visits to the 
user state and 36 revisits to the non-use state. 


5 Recurrent event data 


Recurrent event data is generated when a sub- 
ject experiences a specific event several times 
during the period of observation. One feature 
of this data is that event times, possibly cen- 
sored, are ordered and correlated. Such data are 
frequently observed in longitudinal follow-up 
studies. Event history recorded through retro- 
spective studies also generates recurrent event 
data. For example, several demographic surveys 
collect timing of child birth to women or migra- 
tion events through retrospective enquiries. In 
this data the event history will be censored at 
the duration at the time of survey. Unlike in ret- 
rospective survey data, in a prospective survey 
the subject can experience another competing 
risk such as death (transition to an absorbing 
state). In such a situation the event history will 
be recorded only up to the point of death. If 
death has not occurred the history will be cen- 
sored at the time of last observation. In this 
section we will present measures to summa- 
rize recurrent event history data without the 


presence of an absorbing event (death). Modi- 
fications to include the competing event in the 
analysis can be found in Ghosh and Lin (2000). 

Suppose that an individual is observed over 
a time period [0, 7,]. Let N(t) denote the number 
of events occurring over the time interval [0, t]. 
Then the mean cumulative function (MCF) is 
defined as 


M(t) = EL N(t)], where E stands for expectation. 


The MCF can also be expressed in terms of 
renewal density or the occurrence rates. Define 
renewal density as 


. Prob[event occurs in (t,t+ 6t)] 
na at 


Then M(t) = faitaae or m(t) = M(t). Follow- 


ing Lawless and Nadeau (1995), we present here 
nonparametric estimates of mean cumulative 
function when data on recurrent events are cen- 
sored. 

Suppose an individual i (i = 1,2,.....K) is 
observed in the time interval [0, 7;) and 
the event times for the individual i are 
fin tig, .+ +s tx, Denote 5;(f) = 1, if f < 7; and =0, 
otherwise. Using these notations one can write 
the total number of individuals at risk to have 


an event at time t as r, = > 6,(t). Let n,(t) > 0 
i=1 
as the number of events for individual i that 


occurs at time t (usually zero or 1). Total num- 
ber of events occurring at time t is denoted as 


K 
n(t) = ¥° 6,(t)n,(t). Note that if the event times 


i=1 
are distinct, n(t) =1 for all event times. Then 
the Nelson-Aalen estimator of the mean cumu- 
lative function is (Anderson et al., 1993) 


(16) 


Note that an estimator of occurrence rate can be 
calculated as 


n(t) 


t 


in(t) = (17) 
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Lawless and Nadeau (1995) provide a robust 
variance estimator of M(t) as 


K t 5,(s) 2 

LL ——Lls) — r(s)] (18) 
i=1 [s=1 Ts 

Assume that event times are distinct (if multi- 
ple events occur then we will list uncensored 
cases first followed by the censored cases). By 
definition ry = K (Note r, is the number at risk 
just prior to the first observed time of occur- 
rence). Then 


y=r_, if jisanevent time 


=1r,-1 if jisacensored time 
With this notation we have 


a 1 A 1 a 
MCF, = — and MCF,= —+MCF,_, (19) 
Tr; 


To ji 


A simple recurrence formula to compute the 
robust variance can be written as follows: 


1 ey 
Var, = Var,_,+ z , (4, — =) (20) 


j ieR; 


where d,=1 if the i individual had an event 
at time j, and d,=0 if the i” individual has 
no event at time j. Confidence bounds for the 
cumulative number of events are usually calcu- 
lated under the assumption that the recurrence 
time follows a lognormal distribution. Specifi- 
cally, the lower and upper bound for log (MCF) 
is calculated as 


sqrt( Var, ) 


log(MCF,) + 
og(MCR,) +24" ce 


(21) 


Exponentiation of the confidence intervals for 
log(MCF,) will give the confidence intervals for 
MCF,. 


5.1 Examples 


In order to illustrate the method we use birth 
history data of 5 women recorded in a fertility 
survey at a cross-sectional point. 


Table 24.4 Birth history of Woman 


Woman Ages at birth 


15, 17, 18, 20, 25, 30, 32+ 
17, 23, 24+ 

24, 25+ 

23, 28+ 

20, 23+ 


oORWN eR 


+ indicates censoring 


In Table 24.5 we sort the data by age at birth 
and censoring. When there is a tie it is assumed 
that event times precede the censoring time. On 
the basis of this assumption a risk set is calcu- 
lated at ages when there is a birth. 

Data in Table 24.4 can be sorted in ascending 
order by age at birth and censoring as shown 
in Table 24.5. Use equations 17, 19, 20, and 
21 to complete the table. Table 24.5 shows 
that in this sample a woman will have on an 
average 3.2 children (with a confidence inter- 
val of 2.3 and 4.4) by age 30. To get a real- 
istic view, a new dataset with 500 currently 
married women was created with birth his- 
tory data similar to Table 24.4. The values of 
mean cumulative function for selected ages for 
this data are shown in Table 24.6. The table 
shows that a married woman in continuous 
marriage will have, on an average, seven chil- 
dren by age 40 with a confidence interval of 
(6.5, 7.5). 


6 Current status data on recurrent 
events 


Complete data (except for censoring) on recur- 
rent events usually consists of the information 
of the number of events occurred and the timing 
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Table 24.5 Calculation of mean cumulative function* 


ID Age Censoring status 1, M(t) Lower limit | Upper limit 
1 15 1 5 0.2000 0.2000 0.045719 0.874911 
1 17 1 5 0.2000 0.4000 0.140881 1.135713 
2 17 1 5 0.2000 0.6000 0.255922 1.406678 
1 18 1 5 0.2000 0.8000 0.382493 1.673235 
1 20 1 re) 0.2000 1.0000 0.516851 1.934792 
bs) 20 1 5 0.2000 1.2000 0.656933 2.192004 
2 23 1 5 0.2000 1.4000 0.801453 2.445560 
4 23 1 5 0.2000 1.6000 0.949545 2.696028 
) 23 0 4 
3 24 1 4 0.2500 1.8500 1.132471 3.022153 
2 24 0 3 
1 25 1 3 0.3333 2.1833 1.372788 3.472350 
3 25 0 2 
4 28 0 1 
1 30 1 1 1.0000 3.1833 2.315633 4.376081 
1 32 0 0 
*For calculation of confidence interval z, = 1.65 

Table 24.6 Cumulative mean function (births) for a 

hypothetical cohort of married women without marital 

disruption (illustrative data from a demographic survey) 

Age Risk set M(t) Lower limit — Upper limit 


15 500 0.176 
20 459 1.385 
25 306 2.864 
30 159 4.269 
35 61 5.804 
40 20 6.978 


0.142844 0.216851 
1.283716 1.493680 
2.706247 3.032014 
4.044511 4.506071 
5.466983 6.163056 
6.492089 7.901766 


of each event at the time of recording. How- 
ever, in many occasions the recorded data will 
consist only of the number of events and the 
time of recording. For example, some demo- 
graphic surveys will record only the number 
of children ever born and the age at survey. 
Table 24.7 gives illustrative data on number 
of children born to 47 women, married after 
age 20 and in intact marriage at the time of 
survey. The goal of the analysis is to compute 
the mean cumulative function with the current 
status data. 


Suppose there are k independent subjects 
and the subjects report N,(t) events at the 
time of survey. The goal is to estimate mean 
cumulative function M(t) = E(N(t)) for t = 1, 
2,..., T, based on the current status data. Sup- 
pose that K subjects have different observed 
times (we will relax this assumption later). 
Direct information about M(t) is available at 
the observation times t,,t,,....,t,. Note that 
M(t) should be a non-decreasing function of t. 
The directly observed M(t) may not be meet- 
ing this condition. Sun and Kalbfleisch (1993) 
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Table 24.7 Distribution of married women by age at survey and number of children ever born 


Age at survey 


Distribution of married women by number of children at survey 


1 4 


3 


4 5 6 7 Total women 


21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 


OOODOCOOROONNNUWPR 
CROONNNRFPNWNPER 


ONFRRHEDN 


PrPRWRPWONRPNTBDWPR 


or 
PROR 

o 

an 


proposed isotonic regression to obtain estimates 
of M(t) that satisfies the required condition. 
Specifically, they advocated the pool adjacent 
algorithm (Barlow et al., 1972) to obtain M(t). 
The procedure can be briefly described as fol- 
lows. Rank the observation times and compute 
the mean response at each time point. If any 
two adjacent means are out of order [M(t+1) < 
M (t)], then the observations in these two means 
are combined to form a block, and pooled block 
mean is computed. If any two block means are 
out of order, the observations in these blocks are 
pooled to form a new block mean. This process 
is continued until all block means are in proper 
order. Confidence intervals for the estimated 
non-decreasing mean cumulative function can 
be computed as 


max{M (t)) — 8 G/,/n; < M(t;)} 


< min{M(t) — s6//m™} 
jsi 


where S, denotes the upper a@ point of the stu- 
dentized maximum modulus distribution with 
parameter k, n; is the observed number of sub- 
jects at time t;, and G’ is the pooled estimate 
of the variance (Korn, 1982). These confidence 
intervals are directly obtained from the sample 
means. Table 24.8 illustrates the method using 
the data in Table 24.7. Published values of s, 
are available in Hahn and Hendrickson (1971). 
(For large values of k, s, can be approximated by 
standard normal distribution.) Observed mean 
number children at various ages, based on the 
data, are given in the second column in Table 
24.8. Because the observed means are not mono- 
tonic, we obtained the pooled estimates shown 
in column 4 of Table 24.8. For the data the 
pooled standard deviation is estimated to be 
1.318. There were 15 ages represented in the 
data. For a confidence level of 90% the stu- 
dent’s t value is 1.753. The calculated confi- 
dence intervals are shown in Table 24.8. The 
results show that a 35-year-old woman will, on 
an average, have 5 children with a confidence 
interval (3.97, 6.03). 
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Table 24.8 Mean cumulative function and confidence intervals 


Current age Observed M(t) Number Smoothed M(t) Lower confidence | Upper confidence 
of women interval interval 
21 1.0000 1 1.000 0.000 3.310 
22 1.0000 3 1.000 0.000 3.310 
23 1.1666 6 1.167 0.000 3.310 
24 1.5000 4 1.500 0.000 3.386 
25 1.6000 5 1.600 0.000 3.386 
26 2.1429 7 2.100 0.367 3.721 
27 2.0000 i 2.100 0.367 3.721 
28 2.0000 2 2.100 0.367 3.721 
29 2.2000 5 2.200 0.643 3.757 
30 2.3333 3 2.333 0.821 3.846 
31 3.0000 1 3.000 1.666 4.334 
32 4.0000 3 3.857 2.845 5.155 
33 3.7500 4 3.857 2.845 5,155 
34 5.0000 1 5.000 3.967 6.033 
35 5.0000 1 5.000 3.967 6.033 


7 Backward recurrent times 


Event history data sometimes record at the time 
of a survey only the time elapsed since the last 
event. For example, a fertility survey records 
the time elapsed between survey and the last 
live birth. Such data have the advantage that 
the respondent can accurately recall the tim- 
ing of the most recent event. Such data is 
referred to as backward recurrence time data 
(Allison, 1985). One natural question is whether 
information on backward recurrence time can 
be used to summarize the distribution of the 
interevent times. The standard survival anal- 
ysis techniques are not directly applicable to 
this data because, in theory, all observations 
are censored. Allison (1985) proposed several 
methods to conduct regression analysis of back- 
ward recurrence times. Other applications can 
be found in Ali et al. (2001), Keilding et al. 
(2002). 

In this paper we look at a nonparamet- 
ric method to estimate the distribution of 
the interevent times. Denote g(y) as the den- 


sity function of the backward recurrence time 
and f(x) as the density of the correspond- 
ing interevent time. When the interevent times 
have a common distribution with density f(x) 
(with the corresponding distribution function 
F(x)) relationships between g(y) and f(x) can be 
established. If the survey time is relatively far 
from the beginning of the process, the limiting 
distribution of the backward recurrence time 
is related to the distribution of the interevent 
times as: 


, where p = E(X), the mean of 


_1-F(y) 
gly) = a. 


the interevent time. 


1 
This relationship implies that uw = HO) Using 


these relationships, a nonparametric estima- 
tion of the survival function S(y) = 1— F‘(y) is 
obtained as: 


_ 8) 
8(0) 


S(y) (22) 
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Table 24.9 Distribution of backward recurrence time 


Time since last live birth Proportion of women (N=5) Smoothed proportion g(y) S(y) = 8) 
(in completed years) (0) 
0 0.110 0.3622 1 

1 0.294 0.3008 0.83048 
2 0.256 0.2394 0.66096 
3 0.168 0.1780 0.49144 
4 0.114 0.1166 0.32192 
5 0.058 0.0552 0.15240 


However, it is shown that the nonparametric 
maximum likelihood estimator of a decreas- 
ing density is inconsistent at 0 (Sun and 
Woodroofe, 1996). One solution is to estimate 
g(0) using the observed histogram after smooth- 
ing. Usually a smoothing algorithm such as 
loess (Cleveland and Devlin, 1988) can be used. 


7.1 Example 


We use some data extracted from a demo- 
graphic survey that recorded the birth history 
of women. Five hundred records of women 
in intact marriage and with at least one birth 
were extracted. For these women, the backward 
recurrence (time elapsed (in years) since last 
live birth at the time of the survey) is noted. 
The distribution of the observed forward recur- 
rence time is given in Table 24.9. Note that the 
observed g(0) is smaller than the rest of the 
proportion. Therefore, a smoothed value is cal- 
culated. In this case a prediction based on a 
linear predictor using the time values greater 
than one is used. The estimate of g(0) = .3622 
is obtained giving a mean birth interval of 2.76 
years. The calculated survival function based 
on equation 21 is presented in Table 24.8. Based 
on the survival function, the median birth inter- 
val is calculated to be 2.95 years. 


8 Summary 


This paper summarizes a number of nonpara- 
metric techniques to describe event history 


data. No attempt is made to conduct subgroup 
or covariate analysis. Although summary mea- 
sures are rather easy to obtain, computation of 
standard errors may in some situations not be 
very easy. For example, computations of stan- 
dard errors of summary measures for multistate 
data are not very easily obtained. Computation 
of summary measures and their standard errors 
are not available in standard statistical pack- 
ages beyond single spell data without compet- 
ing risks. 
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| Chapter 25 J 


The Cox proportional hazards model, 
diagnostics, and extensions 
Janet M. Box-Steffensmeier and Lyndsey Stanfill 


Event history analysis (also referred to as 
duration, survival, or reliability analysis) is a 
technique that allows researchers to assess the 
implicit risk of an event occurring. That is, 
we consider not only whether an event occurs, 
but when. Event history analysis provides an 
understanding of the timing and history lead- 
ing up to the event, from which we can draw 
inferences about the process. However, we need 
to check several diagnostics, in order to have 
confidence in our inferences. This chapter dis- 
cusses the Cox proportional hazards model 
and the necessity for diagnostics. Cox mod- 
eling extensions to account for dependence, 
such as the conditional frailty model, are also 
presented. 


1 Introduction 


An event is a change from one state to 
another; examples include death, marriage, fall 
of governments, or dissertation completion. The 
dependent variable in an event history model 
is the time until the event occurs. Event his- 
tory models are well-suited for longitudinal 
analysis because they can easily accommo- 
date problems common to longitudinal data 
such as censored observations and time-varying 


covariates (explanatory variables).1_ Censoring 


occurs when information about an observation 
is incomplete, such as when an observation has 
not changed from one state to another when 
the data collection process ends. Time-varying 
covariates can take on different values over 
time for a single observation. Censoring and 
time-varying covariates often present statistical 
problems that can be overcome by a survival 
model. 

Two types of event history models are para- 
metric and semiparametric models. Parametric 
models assume that the time until an event 
occurs follows a specific distribution, such 
as the exponential, and the distribution of 
when the events happen can be thought of as 
time dependency in the data. Parametric dis- 
tributions are most often used in engineering 
when the analyst has a strong understanding 
of the distribution of the risk of failing with 
respect to time. The primary advantage of para- 
metric event history models is the ability to 
forecast. 


‘Klein and Moeschberger (1997), Hosmer and 
Lemeshow (1999), Therneau and Grambsch (2000), 
Blossfeld and Rohwer (2001), Singer and Willet 
(2003), and Box-Steffensmeier and Jones (2004) all 
provide useful texts on event history. 
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Conversely, semiparametric models do not 
specify a distributional shape for the timing 
of events; rather semiparametric models are 
parameterized by the explanatory variables. 
A semiparametric model is more appropriate 
when the primary objective is to understand the 
impact of covariates on the risk of an event. 
In this situation, duration dependence is con- 
sidered a nuisance. Time-dependency can be 
thought of as the “left over” effects of time 
after the hazard rate has been conditioned by 
the covariates. If the model had been perfectly 
specified, there would be no time-dependency 
because the hazard rate would be fully charac- 
terized by the covariates. 

The Cox proportional hazards model is the 
most commonly used semiparametric event his- 
tory model, and this model estimates the impact 
of covariates on the risk of an event with- 
out a priori specifying a specific distribution 
for the duration dependence or making any 
assumptions about when an event occurs. The 
inferences drawn from the estimation may be 
misleading if the distribution of parametric 
model is incorrectly specified.” Rather, the haz- 
ard rate is parameterized only by the covariates 
of interest. Typically, in social science research, 
theory is not strong enough to correctly spec- 
ify a distribution shape for the timing process, 
and thus the less restrictive Cox model is pre- 
ferred. One advantage of the Cox model is the 
wide variety of diagnostic tests that have been 
developed. 


2 Cox proportional hazards model 


Key concepts for estimating and understanding 
the Cox model are the hazard rate, risk set, and 
survival function. A hazard rate can be thought 
of as the probability that an event will occur 
for a particular observation at a particular time, 


?See Box-Steffensmeier and Jones (2004) and Golub 
(forthcoming) for a more elaborate discussion of the 
advantages of the Cox model. 


or the rate at which an event occurs for an 
observation at time ¢ given that the observa- 
tion has survived through time t—1. In the Cox 
model, the hazard rate for the ith individual is 
given by: 


hy(t) = ho(t) exp(B'x) 


The baseline hazard rate, h,(f), is a constant 
(unspecified) baseline hazard rate, and x is a 
vector of covariates. A Cox model does not 
report an intercept; instead the intercept is 
absorbed into the baseline hazard function. 
However, should the researcher have a need 
for the baseline hazard rate, it can be calcu- 
lated from the estimates. The underlying hazard 
rate can be thought as the hazard rate when 
all of the covariates equal zero. Therefore, any 
change to the hazard rate is a function of the 
values of covariates. Hazard rates are substan- 
tively interesting to researchers who seek to 
understand how an event is conditional on its 
history. 

The risk set includes all of the observations 
that are still at “risk” for experiencing the event. 
Risk is an implicit aspect of the Cox model 
because the hazard rate is derived from the risk 
set. Once an observation experiences the event 
at time t (changes from one state to another) it 
drops out of the risk set and is no longer part 
of the dataset being analyzed in later periods 
(t> T). Instead, the observation is now incorpo- 
rated into the failure rate. In survival models, 
the hazard rate is a ratio of failure times and 
survival times: 

ff) 


h) = Sy 


where h(t) is the hazard rate, f(t) is the failure 
rate, and S(t) is the survival function. When an 
object is still at risk, it is incorporated into the 
model in the survival function. This method 
is what allows for survival models to uniquely 
integrate censored observations into the model. 
An observation that is censored is simply still 
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in the survival function at the end of the period 
of study. 


3 Cox model residuals 


Unlike least-squares residuals, which are the 
distance between the observed and predicted 
values of an observation, a duration model does 
not always provide a systematic component 
for censored observations due to _ estima- 
tion via partial maximum likelihood (Box- 
Steffensmeier and Zorn, 2001; Hosmer and 
Lemeshow, 1999). However, several different 
kinds of residuals have been developed for 
assessing the adequacy of the model. Some of 
the most common and useful residuals are: 


= Cox-Snell residuals used to assess the overall 
fit of a posited model 

™ martingale residuals used to assess the func- 
tional form of covariates and to compute 
additional residuals 

™ score residuals used to assess the potential 
influence of an observation on the estimated 
coefficients 

= deviance residuals used to detect outliers 

= Schoenfeld residuals, which are critical for 
testing the proportional hazards assumption. 


Cox-Snell residuals are based on the expected 
number of events in a given time interval or an 
expected count. Recall from above that the Cox 
model, 


h,(t) = ho(t) exp(B'x) 


estimates survival times S,(t). If the model 
is adequate the estimated survival times S;,(t) 
should be very similar to the actual survival 
times S,(t). Cox-Snell residuals assess the rela- 
tionship between the estimated and actual sur- 
vival times. The residual is given by: 


Tos, = eXp(B'X;) Hy (t)) 


where H,(t;) is the cumulative hazard. If the 
correct model has been fit to the data, then r,, 


has a unit exponential distribution. This means 
that the hazard ratio equals one (for further dis- 
cussion, see Box-Steffensmeier and Jones, 2004; 
Collett, 1994; Cox and Snell, 1968; Klein and 
Moeschberger, 1997). The Cox-Snell residuals 
are most often used to assess how well the 
model fits the data, which is discussed later in 
the chapter. 

The Martingale residuals use a “counting 
process” approach. To understand the intu- 
ition, Therneau and Grambsch say to think of 
each observation “...as the realization of a very 
slow Poisson process” where “censoring is not 
incomplete data, rather the Geiger counter just 
hasn’t clicked yet” (2000, p. 68). This concept 
has created a way for researchers to overcome 
the problem of not having an actual failed time 
for every observation. 

The counting-process representation of the 
Cox model is a linear-like model that counts 
whether the event occurs a time tf: 


6,(t) = H(t) + M,(b) 
and rearranging terms: 
M(t) = 6;(t) — H(t) 


To explain, 5,(t) is a censoring indicator. Each 
observation receives a zero for the censoring 
indicator until it experiences an event. In the 
time period the observation experiences the 
event, 6,(t) =1, and equals one for every time 
period afterwards. H,(t) is the hazard or the risk 
of the event occurring for an observation at each 
time period. After an observation experiences 
an event, H;(t) = 1 for each time period after the 
event. 

M,(t) is referred to as the martingale and can 
be thought of as the error component. The mar- 
tingale has the same properties as the least- 
squares residuals: the mean of the residuals is 
zero (E(Mi) = 0) and there is no covariation in 
the residuals across obsevations (cov(M;, M,) = 
0). The martingale is equivalent to the censoring 
indicator minus the Cox-Snell residuals 


M;(t) = 6;(t) = Tos; 
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Thus, we see that the martingale residuals can 
be used to compute other types of residuals, 
but the martingale residual can also be used to 
assess the functional form of a covariate. 

Martingale residuals can also be used to cre- 
ate score residuals for each covariate. Score 
residuals can then be used to assess the poten- 
tial influence of an observation on the estimated 
coefficients. Score residuals for the ith subject 
on the kth covariate are calculated as: 


L,= [ %O-X,OlaMmo 


where X(t) is the weighted mean of covariate 
X over the observations still in the risk set at 
time t. The weight, dM,(t), is the change in 
the martingale residuals for the ith subject and 
time t. 

Unlike least-squares residuals, martingales 
are not distributed around zero. Deviance resid- 
uals are also martingale based and normalize 
the martingale residuals so that they are sym- 
metric around zero. These residuals are calcu- 
lated using using: 


D, = sign(M,(t)){—2[M,(t) +8 log t(8;—_M,()}}"” 


M,(t)is the martingale residual for the ith obser- 
vation, and where the martingale is zero, the 
deviance residual is zero. 

Finally, Schoenfeld residuals are needed 
to test the proportional hazards assumption. 
The Schoenfelds are simply the sum across 
observations of the score residuals for each 
covariate k: 


N 
Sy) = Lad 


The Schoenfeld residuals can be thought of as 
the observed minus the expected values of the 
covariates at each failure time, and the summa- 
tion above yields a single value for each covari- 
ate at each time point. 


Poig = Ci(Xig — Xwik) 


These residuals can be used to assess the pro- 
portional hazards assumption. 

The Cox model is generally a very robust 
model, however, as in any model, diagnos- 
tic checking is important. The residuals are 
used for a variety of these diagnostics, includ- 
ing linearity in the covariates and proportional 
hazards. 


4 Covariate functional form 


As with least-squares models, the functional 
form of the covariates must be tested. The Cox 
model assumes that the covariates are loglinear, 
but often continuous variables assume a more 
complicated form. Given the nature of event 
data, nonlinear functional forms are more likely 
in event history models than standard least- 
squares models. 

Failing to detect and correct for nonlinear- 
ity leads to several undesired effects in the 
model (Therneau and Grambsch, 2000; Keele, 
2005). First, the estimates are biased and exhibit 
decreased power of statistical tests. Second, 
the fact that the effect of the covariate differs 
across the values of the explanatory variable 
changes the interpretation of the impact of the 
variable on the hazard rate. Finally, failing to 
detect nonlinearity has consequences for diag- 
nosing and correcting violations of the propor- 
tional hazards assumption. Specifically, tests 
for nonproportionality can fail in the presence 
of an incorrect functional form, and correct- 
ing for a violation of the proportional hazards 
assumption when the failure is due to nonlin- 
earity will not produce the correct model. Thus, 
the functional form of the covariates should be 
assessed prior to testing for nonproportionality 
(Therneau and Grambsch, 2000; Keele, 2005). 
Three methods are available for detecting non- 
linearity in the covariates. The first plots the 
martingales saved from a Cox model against 
each covariate. An alternative test uses a two- 
step process in which martingales are saved 
from a m-1 Cox model (where m is the num- 
ber of covariates in the model). Then those 
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smoothed martingales would be plotted against 
the missing covariate. The process is repeated 
for each covariate. The third method for assess- 
ing the linearity of covariates directly models 
the functional form with a smoothing spline 
and provides a statistical test for the presence 
of nonlinearity (Keele, 2005). The model is esti- 
mated using smoothing splines for any covari- 
ates suspected of nonlinearity (either because of 
theory or because of diagnosis using martingale 
plots). Then, a Wald test can be used to decide 
whether a nonlinear affect should remain in the 
model.? 

We illustrate the importance of assessing 
covariate functional form of the Cox model 
using data on international disputes among 827 
“politically relevant” dyads from the period of 
1950 to 1985 (Oneal and Russett, 1997; Beck, 
Katz and Tucker, 1998; Box-Steffensmeier and 
Zorn, 2001). Each dyad (pairs of nations) is 
observed once for each year it is in the dataset 
for a total of 20,448 observations. The depen- 
dent variable is the duration from the beginning 
of the period until the onset of a militarized 
international dispute between the two nations 
that make up the dyad. For pedagogical reasons, 
we model duration as function of just three fac- 
tors: the level of democracy in the dyad (scaled 
from most autocratic (0) to most democratic 
(1)), the presence of an alliance between the 
nations (binary variable where 1 indicates an 
alliance), and whether the two nations are geo- 
graphically contiguous (binary variable where 
1 indicates contiguity). The model has only 
one continuous variable to test for nonlinear- 
ity. After estimating the Cox model, we plot 
the saved martingales against the democracy 
covariate using a lowess smoother. Upon exam- 


3See Keele (2005) for an in depth discussion of 
directly modeling functional forms using smoothing 
splines in R. 

*The full model can be found in Box-Steffensmeier 
and Jones (2001) or Box-Steffensmeier and Zorn 
(2001). 


ining the plot (Figure 25.1), it is clear that this 
covariate violates the linearity assumption as it 
is not a straight line. The alternative two-step 
test confirms the violation (Figure 25.2). Based 
on the figures, a quadratic transformation of the 
democracy covariate is suggested. Now that the 
linearity of the covariates has been tested, we 
are able to perform tests of the proportional haz- 
ards assumption. 


Martingale residuals approach 1 
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Figure 25.1 Test of democracy covariate functional 
form 
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Figure 25.2 Two-step test of democracy covariate 
functional form 
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5 Proportional hazards assumption 


The Cox model assumes that the hazards of two 
observations with different values on one or 
more covariates differ only by proportionality 
(Box-Steffensmeier and Zorn, 2001). If the pro- 
portional hazards assumption holds, the hazard 
value will not differ as a function of time: a 
hazard rate will be the same at the first time 
period under study as it is in the last period 
under study. 

To illustrate proportional hazards, suppose 
we have two observations, A and B, in our data. 
The hazard rates at time ¢ for observation A, 
h,(t), and observation B, h,(t), are proportional 
for any value of t. This can be expressed as: 


ha(t) = Chz(t) 


where C is a nonnegative constant equal to the 
proportion of the two hazards, which can also 
be shown as 


— halt 
°= F0) 


This assumption implies that the ratio of two 
hazards is constant over time. The effect of a 
covariate shifts the hazard rate by a factor of 
proportionality regardless of when the event 
occurs (Box-Steffensmeier and Zorn, 2001). 

As noted above, the hazard rate for the Cox 
model is given by: 


h,(t) = ho(t) exp(Bx) 


Since the hazard rate for the Cox model is pro- 
portional, the ratio of two hazards can be writ- 
ten as: 


h(t) 
Ao (2) 
The proportional difference between the two 


observations is a function of having different 
values for the covariates. 


= exp[f' (x; — x;)] 


Assessing whether or not the proportional haz- 
ards assumption holds is essential when esti- 
mating a Cox model. Violation of the assump- 
tion can lead to biased estimates and decreased 
power in statistical tests (Box-Steffensmeier and 
Jones, 2004). A hazard rate that is increasing over 
time tends to overestimate the impact of covari- 
ates. Alternatively, a hazard rate that is decreas- 
ing over time, or converging, is biased towards 
zero (Kalbfleisch and Prentice, 1980). 

There are numerous substantive reasons we 
may not expect the assumption of proportional 
effects to hold. We may hypothesize that the 
effect of one of the covariates changes over 
time due to factors such as learning, life-course 
changes, or institutionalization. For example, 
life-course changes may lead us to posit that the 
effect of unemployment on recidivism varies 
over time. Or the process of institutionalization 
may lead us to expect that political alliance size 
may be large early in the duration of an alliance, 
but decrease over time (Zorn, 2000). 

The generalized Cox model, which allows 
hazard ratios to vary over time, can be 
expressed as: 


A(t) = ho(t) exp[X;B + (Xig) I 


where the effects of individual covariates are 
allowed to vary by some function g (-) of time. 
Tests of proportional hazards assume that y = 0, 
or that any change in the hazard rate is a func- 
tion only of the covariates. 

Violations of the proportional hazards 
assumption are detected with residual-based 
tests. The first test for proportionality uses the 
Schoenfeld residuals. If proportional hazards 
holds, there should be no relationship between 
an observation’s residual for that covariate and 
the length of the survival time. A plot of 
the Schoenfeld residuals against time reveals 
whether the values of 7,,, are changing with 
respect to time. 

Returning to the interstate dispute example 
from above, we estimate a Cox model and 
save the Schoenfeld residuals. Of the three 
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covariates, only the allies plot looks as though 
the covariate might exhibit nonproportionality 
due to the plotted line not being straight (see 
Figure 25.3). However, plots can be misleading 
and lack a clear diagnosis of nonproportional- 
ity; therefore, statistical tests are recommended 
in addition to residual plots. 

Terry Therneau, Patricia Grambsch and 
Thomas Fleming (1990) first developed a sta- 
tistical test based on Schoenfeld residuals to 
detect a global violation of the proportional haz- 
ards assumption, i.e., a violation of the pro- 
portional hazards assumption for the model as 
a whole. This global test uses the maximum 
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Figure 25.3 Test of PH assumption for democracy 
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Figure 25.4 Test of PH assumption for allies 
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Figure 25.5 Test of PH assumption for contiguity 


of the absolute value of the residuals summed 
over time. A second residual-based statistical 
test evaluates the proportionality of each covari- 
ate; Harrell’s (1986) p is the correlation of each 
covariate’s Schoenfeld residuals and rank of 
survival time. Statistical significance is based 
on the chi-square distribution where the null is 
p =O or no correlation between the residuals 
and time. Grambsch and Therneau (1994) mod- 
ify this test by using the scaled residuals, and 
also provide an improved global test for non- 
proportionality based on the aggregated (across 
covariates) covariance between the unscaled 
Schoenfeld residuals and survival time. 

Table 25.1 presents the statistical tests 
of the proportional hazards assumption for 
the interstate disputes example. The columns 
designated p report the estimated correlation 
between the scaled residuals and J/n(Time), 


Table 25.1 Statistical tests of the proportional 
hazards assumption 


Covariates p a d.f. p-value 
Democracy 0.086 4.36 1 0.0368 
Allies 0.146 23.47 1 0.0000 
Contiguity -0.052 3.04 1 0.0813 
Global test 26.19 3 0.0000 
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Table 25.2 Comparing Cox regression estimates 


Original model Model correcting for 


Model correcting for 


nonlinearity nonlinearity and 
nonproportionality 

Covariates Coefficient (s.e.) Coefficient (s.e.) Coefficient (s.e.) 
Democracy —1.03 (0.22)* 
Democracy” —1.20 (.24)* —1.62 (.45)* 
Democ? x In(Time) AZ (C17) 
Allies —0.37 (0.17)* —0.37 (.17)* —1.11 (.29)* 
Allies x In(Time) .35 (.11)* 
Contiguity 1.47 (0.17)* 1.48 (.17)* 1.48 (.17)* 
RR? 0.32 0.33 0.30 
InL —2523.09 —2517.93 —2512.20 
N 20448 20448 20448 


Note: Efron method used for ties. Coefficients are Cox proportional hazards estimates with robust 
standard errors in parentheses. One asterisk indicates p < .05. 


while the y* and p-values indicate the con- 
fidence with which we can reject the null 
hypothesis that the hazard ratios for differ- 
ent values of that covariate are constant over 
time. The global test shows a problem with 
the proportional hazards assumption, (p <.001). 
In addition, quadratic democracy covariate and 
the allies covariate both have p-values lower 
than conventionally accepted levels (p<.05). 
In this example, the residual plots might have 
been misleading because the plot of democracy 
on time did not appear nonproportional. 

To correct for nonproportionality, the offend- 
ing covariate is interacted with some function 
of time; usually the interaction is In(Time). 
Both the offending covariate and the covariate 
interacted with time are included in the new 
model.® Both the quadratic democracy covari- 
ate and the allies covariates are interacted with 


5 Testing the statistical significance of time-interacted 
terms has been posited as a third diagnostic technique 
for detecting nonproportionality. However, using this 
method to detect and correct for violations of the 
assumption is not recommended as it is the approach 
for correcting the problem as well (Box-Steffensmeier 
and Jones, 2004; Grambsch and Therneau, 1994). 


In(Time) and included in the new Cox model. 
Table 25.2 presents the Cox model with non- 
proportionality and the Cox model with log- 
time interactions. 

When we examine the original model against 
models that correct for nonlinearities and non- 
proportionality, it is easy to see that failing 
to test the assumptions of the model could 
have serious consequences. While the variables 
retained the same direction and statistical sig- 
nificance, the impact of the covariates on dura- 
tion times does change (see Table 25.2). The 
coefficient for democracy changes from —1.03 to 
—1.20 when accounting forthe correct functional 
form and to —1.62 when accounting for func- 
tional form and nonproportionality. Similarly, 
the coefficient for allies changes from —0.37 to 
—1.11 when accounting for nonproportionality. 
In addition to testing linearity and proportional 
hazards assumptions, the researcher should per- 
form diagnostics for outliers and leverage and 
can assess the fit of the model. 


6 Other diagnostics 


Residual-based diagnostics have been devel- 
oped to test for outliers, influence, and the 
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adequacy of the model. We can use the deviance 
residuals to assess the model in terms of the 
ith observation to identify outliers. Outliers are 
problematic for the model because they can 
lead to erroneous conclusions about the haz- 
ard rate. A plot of deviance residuals against 
the observation numbers demonstrates which 
observations need to be examined more closely. 
In addition, a plot of deviance residuals against 
duration time can provide some initial insight 
into the adequacy of the specified Cox model 
(see Box-Steffensmeier and Jones, 2004, for 
elaboration). 

Score residuals, on the other hand, are used 
to measure the influence of an observation on 
the size of the coefficient. An observation with 
influence on the size of the coefficient tempers 
the claims that a researcher can make from the 
model. The matrix of score residuals, along with 
the variance—covariance matrix, creates a mea- 
sure analogous to the dfbeta used with least- 
squares models. Dfbetas measures the influence 
of the ith individual on the jth covariate. In 
other words, multiplying the score residuals by 
the variance—covariance matrix provides a mea- 
sure for how much each observation increases 
or decreases a given covariate. 

Cox-Snell residuals are used to assess the fit 
of the model. Recall from above that Cox-Snell 
residuals can be thought of as the expected 
number of events in a given time interval or 
the expected count. Therefore, in using these 
residuals we can better understand how well 
the model matches the data. Since the Cox-Snell 
residuals, r,,; , follow a unit exponential distri- 
bution, a plot of the residuals on the integrated 
(or cumulative) hazard rate based on the resid- 
uals should yield a straight line through the ori- 
gin with a slope equal to 1 (a 45-degree angle). 
In the example of interstate dispute, the plot of 
the Cox-Snell residuals against the cumulative 
hazard rate of the residuals could be consid- 
ered a concerning lack of fit (see Figure 25.6). 
We should be concerned that the model has not 
been well specified. 
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Figure 25.6 Cox-snell residuals from interstate 
dispute data 


While the plot of the Cox-Snell residuals is 
commonly used for evaluating the fit of the 
model, alternative techniques are being devel- 
oped to indicate how well the model performs. 
A new R?’ statistic for survival models allows 
us to assess the goodness of fit by measuring 
the amount of explained variation. In dura- 
tion data, researchers are interested in how 
much of the variation in the survival time is 
accounted for by the model. However, unlike 
least-squares models which do not have censor- 
ing, the R* statistic for Cox models must take 
into account the number of uncensored obser- 
vations (observations experiencing the event) 
(Royston, 2006).° However, a higher R* does not 
necessarily indicate that a model better fits the 
data. Rather, the R* provides an understanding 
of how the model accounts for the variation in 
survival times. In the interstate dispute exam- 
ple, R* =0.30, which we interpret as 30% of the 
variation in survival time, can be explained by 
the model. When used with other measures of 
model fit, researchers can use this statistic for 
a better understanding of the adequacy of the 
model. 


®See Royston (2006) for an in-depth explanation 
of the computation of the R* for survival models. 
Royston provides syntax for using the statistic in 
STATA. 
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7 Interpreting a Cox model 


The Cox model coefficients are parameterized 
in terms of the hazard rate h,(t) = hy(t) exp(’x), 
where h,(t) is the baseline hazard rate and /’x 
are the covariates and regression parameters. 
Thus, a negative coefficient indicates that the 
hazard is decreasing as a function of the covari- 
ate and survival time is increasing. Conversely, 
a positive coefficient indicates that the hazard 
is increasing as a function of the covariate and 
survival time is decreasing. 

In the example of interstate disputes, the coef- 
ficients for democracy and allies are both neg- 
ative, indicating that the hazard is decreasing 
as the values of the covariates are increasing, 
and survival times are increasing. Conversely, 
a positive coefficient for contiguity indicates 
that the hazard is increasing when the dyad is 
contiguous. 

Since the parameter estimates for the Cox 
model reveal information regarding the hazard 
rate, we can provide a more substantive inter- 
pretation of the findings. First, the hazard ratio 
can be found by exponentiating the hazard rate. 
A hazard ratio of less than 1 indicates that risk 
decreases as the covariate increases, and a haz- 
ard ratio of greater than 1 indicates that the 
risk increases as the covariate increases. Fur- 
ther, a hazard ratio close to one suggests that 
the hazard rate does not change as a function 
of the covariate. Hazard ratios are readily inter- 
pretable for binary covariates. For example, the 
allies covariate of —1.11 indicates that the haz- 
ard is decreasing as a function of the covariate 
and that survival times are increasing. Or, inter- 
preted as a hazard ratio, the risk of an inter- 
state dispute when the dyads are allies is .33 
(exp(—1.11)) lower than when dyads are not 
allies. However, when states are contiguous, 
the risk of interstate dispute is 4.39 (exp(1.48)) 
times greater than when the nations are not 
contiguous. 

An additional tool for interpreting the Cox 
model examines the percent change in the haz- 


ard rate as a function of the covariate. The per- 
cent change is calculated as: 


% Ah(t) = 


ee i 
exp (B(x = X,) 


where x; is the covariate and X, and X, are 
different values of the covariate. For exam- 
ple, contiguity with a coefficient of 1.48, as 
above, impacts the hazard rate with a increase 
of 339%: 


% Ah(t) = 


exp(1.48(1)) — exp(1.48(0)) 
( exp(1.48(0)) 


) * 100 = 339% 


When examining the substantive impact of 
the other covariates, however, we need to take 
into account the time-interactions included for 
nonproportionality. When the coefficients for a 
covariate and a time-interacted covariate have 
the same sign, the hazard ratios diverge over 
time, but when the signs are opposite, the haz- 
ard rates converge and then diverge (Teachman 
and Hayward, 1993). The percent change in 
hazard rate for time-interacted covariates can 
be calculated from: 

exp[ (Bx (Xj = X1) + Byo (xj = X,) In(T)] 
— exp[(Bx (x; = Xz) + Bo (%; = Xz) In(T)] | * 100 


% Ah(t)= 


exp[(B.(% = X2) + Ba, (X; = X2) In(T)] 


where £;, is the coefficient for the original 
covariate, B,, is the covariate for the interac- 
tion, and T is a specific time. This calculation 
takes into account the change of the hazard rate 
over time. 

For example, the allies covariate was inter- 
acted with the log of time due to violations of 
the proportional hazards assumption. Substan- 
tively, the impact on the hazard rate of a dyad 
being an ally would be a 7% decrease without 
taking into account the time interaction. When 
both allies and In(t) allies are considered, the 
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Table 25.3 Cox regression of the timing of interstate disputes 


Covariates Coefficient (s.e.) Hazard ratio Change in X Impact of the covariate 


Democracy? —1.62*(.45) 0.20 
Democ? x In(Time) .17(.17) 1.19 
Allies —1.11*(.29) 0.33 
Allies x In(Time) .35*(.11) 1.42 
Contiguity 1.48*(.17) 4.39 
R? 0.30 

InL —2512.20 

N 20448 


on the hazard rate 


~77%(T =1) 
on 8% (T = 10) 

—53%(T = 1) 
Os 90%(T = 5) 
0,1 339% 


Note: Efron method used for ties. Coefficients are Cox proportional hazards estimates with robust standard errors in 


parentheses. One asterisk indicates p < .05. 


percent change in the hazard rate from a dyad 
that is allied compared to a dyad that is not 
allied is decreased 53% at T =1 but increases 
90% at T= 5. Table 25.1 provides a substantive 
explanation of the impact of each covariate on 
the hazard rate. 

A final way of substantively interpreting 
the model is to examine the baseline hazard 
rate. The baseline hazard rate is not calculated 
directly by the Cox model, but can be retrieved 
in the event that the researcher is interested. 
Graphing the baseline hazard rate can provide 
a useful illustration of the model (see Box- 
Steffensmeier and Jones, 2004, for further expla- 
nation); however, researchers are usually more 
interested in the impact of the covariates on 
the hazard. An important substantive motiva- 
tion for using duration models is an implicit 
interest in risk or timing of an event. Providing 
a substantive interpretation of the model results 
provides the researchers with the tools to fully 
characterize the process under examination. 


8 Cox modeling extensions and 
sources of dependence 


Thus far, we have discussed the Cox model 
with respect to single events; however, the flex- 
ible Cox model is amenable to multiple events 


data. The most important categories of multiple 
events to consider are unordered and ordered. 
Unordered multiple events refer to the situa- 
tion where important substantive distinctions 
are drawn about the event. For example, it is not 
just that a cabinet government failed but how 
it failed, e.g., it was dissolved and early par- 
liamentary elections were called or the incum- 
bent cabinet is directly replaced by a new one 
(Diermeier and Stevenson, 1999). Or for a study 
of unemployment duration, Addison and Por- 
tugal (2003) argue that it is important to dis- 
tinguish exit from unemployment by finding 
a job or inactivity. When unordered multiple 
events are considered, the Cox model is strati- 
fied by event type. This allows the baseline haz- 
ard to differ by event type. Each stratum has its 
own baseline hazard function, while the covari- 
ates are constrained to be the same across the 
different strata. This model is the well-known 
competing risks model. 

Ordered multiple events are generally 
referred to as repeated events. For example, 
patients may suffer multiple heart attacks, crim- 
inals may return to prison numerous times, or 
countries experience multiple civil wars. Event 
history models for repeated events explicitly 
incorporate the reality that the risk of expe- 
riencing an event may change once the event 
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has already been experienced. That is, previous 
event occurrence impacts the hazard of expe- 
riencing the event again, so analysts cannot 
assume the events are independent. 

In repeated events data, two types of depen- 
dence are possible: event dependence and case 
dependence. Event dependence occurs when an 
event is conditional on and influenced by a pre- 
vious occurrence. Experiencing one heart attack 
may weaken the heart and make the individual 
more likely to suffer another. Detecting event 
dependence in the model is important because 
the risk of later events may be conditioned by 
a previous experience, thus a hazard rate that 
does not stratify does not allow for the influence 
of an earlier event to change the hazards for 
later events. Case dependence, also referred to 
as heterogeneity, occurs because observations 
in repeated processes are correlated. There may 
be unmeasured, unmeasurable, or unimagined 
factors that affect whether or not the obser- 
vation experiences the event. For example, it 
is widely held that culture and history affects 
the likelihood of a country experiencing civil 
wars. This heterogeneity is important to take 
into account for accurate inferences. If we do 
not account for heterogeneity in the model, i.e., 
if we treat all observations as independent, we 
overstate the amount of information provided 
by each case and produce incorrect standard 
errors. When analyzing repeated events data, 
we want a method that can detect and correct 
for event dependence and heterogeneity. 

The conditional frailty model is equipped 
to handle both event dependence and hetero- 
geneity (Box-Steffensmeier and DeBoef, 2006). 
The model accounts for event dependence by 
stratifying and for heterogeneity by incorporat- 
ing a frailty term. The model is presented as: 


Kix (t) = Age (O) exp(X(HB + H) 


where A,,(t) is the case’s risk for event k as 
a function of an event specific baseline haz- 
ard, Ao, (t), and a case specific random effect, 
u;. If the model exhibits event dependence, the 


event specific baseline hazards will show sepa- 
ration. If the model exhibits heterogeneity, the 
frailty term, or random effect, is statistically sig- 
nificant. Separately incorporating event depen- 
dence and heterogeneity into the model allows 
the sources of dependence to be separated and 
correct inferences to be made about the effect 
of the covariates in the model. 

We illustrate the conditional frailty model 
using data on the duration spent in foster 
care. If heterogeneity is the underlying prob- 
lem, then efforts to reduce event rates are best 
spent searching for ways to target the needs 
of specific types of children with dispropor- 
tionate use of the system and to change the 
conditions associated with churning, i.e., place- 
ment instability. Churning is an important pol- 
icy consideration because it has been linked to 
weakened attachment to a child’s primary care 
giver and to emotional and behavioral prob- 
lems as well as school failure, criminal activ- 
ity, and early parenthood (Cook et al., 1991; 
Fanshel et al., 1990; Goldstein et al., 1973; 
Leiberman, 1987; Zimmerman, 1982). If event 
dependence is itself quite high, then just being 
in the system fosters further time in the sys- 
tem and legislation designed to limit time and 
multiple placements, such as the Adoption and 
Safe Families Act of 1997, should be quite effec- 
tive. In addition to disentangling the effects of 
event dependence and heterogeneity, the model 
is critical for obtaining accurate assessments of 
the effects of measured covariates like sex or 
age. The model is underspecified, but useful 
for pedagogical purposes to illustrate not only 
why we may be interested in separating out 
event dependence and heterogeneity, but how 
to interpret such results. 

We use foster care data from the State of 
Tennessee’s Department of Children’s Services, 
obtained with the help of the Chapin Hall Cen- 
ter for Children at the University of Chicago. We 
currently have data on children placed for the 
first time in 2000 and 2001. Their placement 
histories are observed through December 31, 


Presentogipy Con OBE RSAM-FQShrds model, diagnostics, and extensions 417 


2003. Table 25.4 shows that the variance of the 
random effect is statistically significant, pro- 
viding evidence of heterogeneity across indi- 
viduals. Controlling for heterogeneity allows us 
to make correct inferences about covariates in 
the model. Evidence of event dependence is 
also apparent. Figure 25.7 shows the cumula- 
tive baseline hazards, which vary by event num- 
ber even after accounting for heterogeneity via 
a random effect. This suggests that placements 
are event dependent, with more frequent place- 
ments leading to further disruptions and more 
movement via foster care placements. 


Table 25.4 Conditional frailty model, placements 


Covariate Estimate Robust SE 
urban —0.209 0.034 
gender 0.061 0.020 
black 0.168 0.025 
hispanic 0.012 0.054 
other —0.075 0.051 


Variance of random effect = 0.15, statistically significant 
p-value of 0.00 
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Figure 25.7 Conditional (gamma) frailty model: 
placements data 


Another useful extension of the Cox model is 
a multilevel approach. Both a basic Cox model 
and the conditional frailty model can be a mul- 
tilevel model. For example, we could exam- 
ine differences across the twelve administrative 
regions in Tennessee or add data for additional 
states and examine state policy differences. For 
example, Gifford and Foster (2005) find in their 
cross-classified, multilevel model, that facility- 
level factors are a key determinant of inpatient 
length of hospitalization. Facility-level factors 
explain a greater proportion of the overall vari- 
ation than do even individual characteristics 
(Gifford and Foster, 2005). So, appropriate anal- 
yses may need to allow for the fact that dura- 
tion spells can be nested within individuals and 
perhaps facilities/providers, schools, congres- 
sional districts, states, etc. 

Finally, spatial dependence is another type 
of dependence that may arise in duration 
models and is a particularly promising exten- 
sion. Subjects may be dependent upon observa- 
tions related in spatial proximity. For example, 
neighboring states may influence one another 
or be dependent on one another. Standard 
survival models estimate the impact of vari- 
ables on the timing or risk of an event but 
do not, however, provide a rigorous or gen- 
eralized mechanism for modeling this spatial 
dependence. In the past, survival models have 
incorporated a spatial element via a dummy 
variable for contiguity or a proportional mea- 
sure of the number of neighbors previously 
experiencing the event (Berry and Berry, 1990; 
Volden, 2006). While these methods attempt 
to theoretically incorporate spatial dependence, 
they do not capture the simultaneity and mul- 
tidirectionality of spatial dependence. More- 
over, the lagged proportional measure and spa- 
tial influence is conditional and unidirectional, 
and using this type of measure when the 
process is simultaneous and multidirectional 
results in biased estimates of spatial influ- 
ences (Anselin, 1988). Bayesian spatial survival 
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models incorporate spatial dependence via a 
fraility term. In these models, the unobserved 
shared risk of experiencing an event is param- 
eterized as a function of spatial proximity 
between neighboring observations. Darmofal 
and Young (2006) provide innovative work 
assessing the adequacy of duration models in 
the face of spatial dependence. 


9 Conclusion 


In using the Cox model, researchers have 
opened a window to investigating the process of 
events. By understanding the timing of events 
and how covariates impact timing, we better 
understand how the process unfolds. Tradi- 
tional models may indicate what variables are 
statistically significant for the occurrence of an 
event, but some of those variables may cause an 
event to occur much faster than others. The Cox 
model is a flexible and robust tool that equips 
the researcher for investigating these processes. 
Performing rigorous diagnostics ensures that 
the model produces accurate inferences into the 
process at hand. Furthermore, state-of-the-art 
extensions of the Cox model, such as the con- 
ditional frailty model and the Cox model with 
spatial dependence, open the field even wider. 
Armed with these new techniques, researchers 
provide insights into more complicated event 
processes. 
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| Chapter 26 I 


Parametric event history analysis: an 
application to the analysis of recidivism 
Hee-Jong Joo 


1 Introduction 


Event history (or survival analysis') has been 
developed and used for the analysis of long- 
itudinal data on the occurrence of events 
(see Allison, 1984; Namboodiri and Suchin- 
dran, 1987; Schmidt and Witte, 1988; Blossfeld, 
Hamerle and Mayer, 1989; Yamaguchi, 1991; 
Singer and Willett, 2003; Box-Steffensmeier 
and Jones, 2004). It is a general methodology 
for studying a transition from one event (or 
state) to another, and research interest cen- 
ters on whether and, if so, when events occur. 
In event history analysis, we can parameterize 
both the probability of an event’s occurrence 
and the timing of the event for those who will 
ultimately experience the event. This method is 
appropriate when the variable of interest is the 
time interval (e.g., years, months, days, or sec- 
onds) between the initial event (e.g., marriage) 
and a subsequent terminal event (e.g., divorce). 


‘Event history analysis is often referred to as survival 
analysis. In biomedical and engineering research, for 
example, much of the literature on event history 
methods goes by the name of survival analysis. It 
is also referred to as duration models, failure-time 
models, and reliability models. 


Social scientists are interested in various 
kinds of events and concerned with the pat- 
terns and correlates of the occurrences of events 
(Yamaguchi, 1991). An event is made up of 
some “qualitative change” that happens at a 
specific point in time. For work and career 
researchers, for instance, job changes, promo- 
tions, layoffs, and retirements can be thought 
of as major events. Demographers study births, 
deaths, marriages, divorces, and migrations. 
The major events of interest in criminologi- 
cal studies include crimes, arrests, convictions, 
and incarcerations. In event history analysis, 
event occurrence is defined in terms of an indi- 
vidual’s transition from one state to another. In 
a recidivism study, for example, the first state is 
“released from prison” and the second state is 
“returned to prison.” The term “survival” here 
describes a continuation of the state of being 
“released from prison” and is thus the opposite 
of recidivism. The survival probabilities then 
can be defined as the cumulative proportion 
surviving at the end of a specified time inter- 
val (e.g., month or year) or 1 minus the recidi- 
vism rate. 

Another key concept in event history analysis 
is the hazard rate or hazard function. The haz- 
ard rate, h(t), is the probability that persons who 
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did not experience an event (e.g., reincarcera- 
tion) at the beginning of a specified time inter- 
val (e.g., month) will experience the event (e.g., 
return to prison) during that interval, given that 
the individual is at risk at that time. In each 
time interval, the hazard rate can be calculated 
by dividing the number of events by the number 
of individuals at risk. The hazard rate, which 
is an unobserved variable and usually varies 
with time and among groups, is the “funda- 
mental” dependent variable in an event history 
model. For most applications, however, hazard 
rate is re-expressed by 1/h(t) so that the depen- 
dent variable is transformed into the expected 
length of time until an event occurs (T or Log 
T). One of the most important characteristics 
of the hazard rate is that “it controls both the 
occurrence and the timing of events” (Allison, 
1984, p. 16). 

This statistical method allows researchers to 
examine not only individuals’ survival or fail- 
ure in terms of survival probability or haz- 
ard rate, but also the relationship between the 
length of the survival time and independent 
variables, or covariates, of theoretical inter- 
est (Box-Steffensmeier and Jones, 2004). Event 
history analysis, for example, enables crimi- 
nologists to examine not only whether or not 
an individual was rearrested during a certain 
follow-up period (whether events occur), but 
also in how many days after release from pris- 
ons released parolees were rearrested (when 
events occur), with some attention to the fac- 
tors related to the probability and the timing 
of an event. Event history analysis thus pro- 
vides a different way of viewing recidivism and 
allows comparison among various groups. As 
Allison (1984) argues, the best way to study 
events and their cause is to collect event history 
data and examine the patterns and correlates 
of the occurrences of events with event history 
analysis. To determine whether a research ques- 
tion calls for event history analysis, it is helpful 
to conduct the above-mentioned “whether and 
when test” (Singer and Willett, 2003, p. 306). 


2 Problems of conventional methods 
in the analysis of event history data: 
recidivism as an example 


In the past, recidivism has been, in most cases, 
reported merely as the percentage of parole 
releasees who returned to prison within a cer- 
tain period of follow-up. However, the predic- 
tion of parole outcome is not limited simply 
to success or failure. In many cases, there exist 
substantial differences in the timing of recidi- 
vism across demographic and criminal behavior 
characteristics of each recidivist. Thus, the tim- 
ing of an event of interest is recognized as an 
important factor in the recidivism study. 

In previous studies of recidivism (e.g., Rossi 
et al., 1980), however, ordinary multiple regres- 
sion methods were applied to event history 
data. The events of interest were either arrests, 
convictions, or incarcerations, and the aim was 
to determine how the probability of an event 
depends on several explanatory variables such 
as age at release, gender, race/ethnicity, educa- 
tion, and prior criminal behavior. The depen- 
dent variable is a dummy variable indicating 
whether or not an individual was rearrested 
(or reconvicted or reincarcerated) during a cer- 
tain follow-up period (12 months in the Rossi 
et al. study). As Allison (1984, pp. 10-11) 
pointed out, however, this method is still 
not ideal: 


Aside from the well-known problems in the use 
of ordinary least squares with a dummy depen- 
dent variable (Hanushek and Jackson, 1977, Ch. 7), 
dichotomizing the dependent variable is arbitrary 
and wastes information. It is arbitrary because there 
was nothing special about the 12-month dividing 
line except that the study ended at that point. Using 
the same data, one might just as well compare those 
arrested before or after the six-month mark. It wastes 
information because it ignores the variation on either 
side of the dividing line. One might suspect, for 
example, that someone arrested immediately after 
release had a higher propensity toward criminal 
activity than someone arrested 11 months later. 
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In an effort to avoid these problems, the 
length of time from release to arrest or return to 
prison will be used as the dependent variable 
in a multiple regression. However, a substan- 
tial proportion of each cohort did not return to 
prison during a certain follow-up period, which 
is censored. Censoring exists when incomplete 
information is available about the duration 
of the risk period due to a limited time of 
observation. Exclusion of the censored cases 
can lead to severe bias or loss of informa- 
tion in the parameter estimates of conven- 
tional statistical procedures such as ordinary 
least square (OLS) regression. As an alterna- 
tive solution, one might assign the maximum 
length of time observed as the value of the 
dependent variable for the censored cases. But 
this obviously underestimates the true value 
and substantial bias may result (Allison, 1984; 
Yamaguchi, 1991). 

In addition to this censoring problem, there 
are also difficulties incorporating time-varying 
explanatory variables in a multiple regression 
which predicts timing of an event (Sorensen, 
1977; Tuma and Hannan, 1978; Allison, 1984; 
Ekland-Olson et al., 1991). If a study intends to 
examine possible causes of events or to deter- 
mine how the probability of an event depends 
on several explanatory variables, the event 
history should also include data on possible 
time-varying explanatory variables. While some 
explanatory variables, such as race and sex, are 
constant over time, other variables (e.g., income, 
marital status, and age) may vary with time. 
To avoid these two typical problems—censoring 
and the mishandling of time-varying explana- 
tory variables—with the conventional multiple 
regression approach to event history data, event 
history analysis has been used in many areas of 
the social and behavioral sciences. 


3 Parametric versus nonparametric 
event history methods 


Within event history analysis, there are several 
approaches to analyzing the data. They include 


distribution versus regression and parametric 
versus nonparametric methods. 


3.1 Distribution versus regression methods 


The distribution method—sometimes linked to 
life table analysis—examines the distribution of 
time until an event occurs, or the time between 
events. This is one of the most common meth- 
ods applied in demographic studies, and it 
is basically nonparametric in nature.” In this 
method, survival times are typically measured 
at monthly (or other) intervals permitting the 
computation of detailed survival trajectories for 
each cohort. 

Two related functions are used in the 
analysis: (1) the survival probabilities—the 
cumulative proportion surviving at the end ofa 
specified time interval, or 1 minus recidivism 
rate; and (2) the hazard rate—the probability 
that persons who did not experience an event 
at the beginning of a specified time interval will 
experience the event during that interval. It is 
the rate of the occurrence of the event during 
the risk period (see Figure 26.1). 

Much of the early work on event history 
analysis in the biomedical area, and the intra- 
and inter-cohort analysis of recidivism rates in 
criminology, can be included in this category 
(Ekalnd-Olson et al., 1991; Joo, 1993). In Joo’s 
(1993) inter-cohort recidivism study, for exam- 
ple, this research design allowed for a 36-month 
follow-up for all four release cohorts to deter- 
mine not only if parolees were reincarcerated 
but, ifso, how long after release they returned to 
prison. The life table method examines the pace 
at which the offenders recidivate at monthly 


2“Nonparametric statistics are designed to be used 
when the data being analyzed depart from the dis- 
tributions that can be analyzed with parametric 
statistics. In practice, this most often means data 
measured on a nominal or an ordinal scale. Non- 
parametric tests generally have less power than para- 
metric tests. The chi-square test is a well-known 
example” (Vogt, 1999, p. 192). 


Intv] Number 


Number Number No. of Propn 


Propn 


Cumul Probability Hazard SE of 


SE of 


SE of 


start entering wdrawn exposd termnl] terminating surviving propn density rate cumul probability hazard 
time this intvl] during torisk events surv at surviving density rate 
intvl end 
0.0 1199.0 0.0 1199.0 0.0 0.0000 1.0000 1.0000 0.0000 0.0000 0.000 0.000 0.000 
1.0 1199.0 0.0 1199.0 0.0 0.0000 1.0000 1.0000 0.0000 0.0000 0.000 0.000 0.000 
2.0 1199.0 0.0 1199.0 0.0 0.0000 1.0000 1.0000 0.0000 0.0000 0.000 0.000 0.000 
3.0 1199.0 0.0 1199.0 0.0 0.0000 1.0000 1.0000 0.0000 0.0000 0.000 0.000 0.000 
4.0 1199.0 0.0 1199.0 0.0 0.0000 1.0000 1.0000 0.0000 0.0000 0.000 0.000 0.000 
5.0 1199.0 0.0 1199.0 0.0 0.0000 1.0000 1.0000 0.0000 0.0000 0.000 0.000 0.000 
6.0 1199.0 0.0 1199.0 0.0 0.0000 1.0000 1.0000 0.0000 0.0000 0.000 0.000 0.000 
7.0 1199.0 0.0 1199.0 33.0 0.0275 0.9725 0.9725 0.0275 0.0279 0.005 0.005 0.005 
8.0 1166.0 0.0 1166.0 23.0 0.0197 0.9803 0.9533 0.0192 0.0199 0.006 0.004 0.004 
9.0 1143.0 0.0 1143.0 19.0 0.0166 0.9834 0.9374 0.0158 0.0168 0.007 0.004 0.004 
10.0 1124.0 0.0 1124.0 41.0 0.0365 0.9635 0.9033 0.0342 0.0372 0.009 0.005 0.006 
11.0 1083.0 0.0 1083.0 26.0 0.0240 0.9760 0.8816 0.0217 0.0243 0.009 0.004 0.005 
12.0 1057.0 0.0 1057.0 21.0 0.0199 0.9801 0.8641 0.0175 0.0201 0.010 0.004 0.004 
13.0 1036.0 0.0 1036.0 14.0 0.0135 0.9865 0.8524 0.0117 0.0136 0.010 0.003 0.004 
14.0 1022.0 0.0 1022.0 24.0 0.0235 0.9765 0.8324 0.0200 0.0238 0.011 0.004 0.005 
15.0 998.0 0.0 998.0 26.0 0.0261 0.9739 0.8107 0.0217 0.0264 0.011 0.004 0.005 
16.0 972.0 0.0 972.0 22.0 0.0226 0.9774 0.7923 0.0183 0.0229 0.012 0.004 0.005 
17.0 950.0 0.0 950.0 17.0 0.0179 0.9821 0.7781 0.0142 0.0181 0.012 0.003 0.004 
18.0 933.0 0.0 933.0 30.0 0.0322 0.9678 0.7531 0.0250 0.0327 0.012 0.005 0.006 
19.0 903.0 0.0 903.0 27.0 0.0299 0.9701 0.7306 0.0225 0.0304 0.013 0.004 0.006 
20.0 876.0 0.0 876.0 18.0 0.0205 0.9795 0.7156 0.0150 0.0208 0.013 0.004 0.005 
21.0 858.0 0.0 858.0 7.0 0.0082 0.9918 0.7098 0.0058 0.0082 0.013 0.002 0.003 
22.0 851.0 0.0 851.0 19.0 0.0223 0.9777 0.6939 0.0158 0.0226 0.013 0.004 0.005 
23.0 832.0 0.0 832.0 13.0 0.0156 0.9844 0.6831 0.0108 0.0157 0.013 0.003 0.004 
24.0 819.0 0.0 819.0 11.0 0.0134 0.9866 0.6739 0.0092 0.0135 0.014 0.003 0.004 


Figure 26.1 Life table: 24 months follow-up period 


woo Aeiquuyel:sday AP RREIELIPM SUT Jo YooqpueH HZh 


ParaniBige epey nes Aa fUiQaey-G8"A pplication to the analysis of recidivism 425 


intervals. In this sense, the design is longitu- 
dinal, providing for the evaluation over time 
of inmates released in a given year as well 
as the assessment of changes across cohorts of 
parolees. The life table method, however, does 
not permit the use of covariates or explana- 
tory variables in the model. In the example 
to be presented below, however, the focus is 
on regression-like methods in which the occur- 
rence of events (or hazard rate) is dependent on 
a linear function of explanatory variables. This 
method with regard to recidivism has become 
popular in the study of recidivism (e.g., Far- 
rington and Tarling, 1985; Schmidt and Witte, 
1987 and 1988; Ekland-Olson et al., 1991; Joo, 
1993). Schmidt and Witte (1988) point out the 
usefulness of survival models which include 
explanatory variables: 


The use of explanatory variables allows one to make 
statements about the way in which survival time 
is influenced by individual characteristics, criminal 
history, or structural variables, and it also allows one 
to make predictions for individuals and not just for 
random samples of releases. 


As mentioned above, the hazard rate or h(t) 
is used as a dependent variable in this method, 
and it controls both the occurrences and the 
timing of events. 


3.2 Parametric* versus nonparametric 
methods 


Researchers have relied on a variety of model 
specifications to predict time until recidivism. 
There are two major groups of methods for 


3Parametric statistics are “statistical techniques 
designed for use when data have certain 
characteristics—usually when they approximate 
a normal distribution and are measurable with 
interval or ratio scales. Also statistics used to test 
hypotheses about population parameters” (Vogt, 
1999, p. 206). 


analyzing hazard rates: parametric and non- 
parametric methods. Parametric methods esti- 
mate the effects of explanatory variables, or 
covariates, on hazard rates. We can analyze 
both “time-invariant” covariates, which do not 
vary throughout the duration of observation 
(e.g., race, gender, and age at first arrest) 
and “time-varying” explanatory variables (e.g., 
age, income, and marital status) in parametric 
models. 

On the other hand, nonparametric methods 
do not specify the relation between hazard 
rates and covariates. Instead, separate estimates 
of hazard rates as a function of time (e.g., 
hazard rates calculated in a monthly interval) 
are obtained for each group of time-invariant 
categorical variables such as gender and race 
(Yamaguchi, 1991, p. 3). In most criminologi- 
cal studies, as Schmidt and Witte (1988, p. 18) 
pointed out, parametric models are more com- 
monly used and, “fit recidivism data better and 
provide more accurate predictions than non- 
parametric models do.” 

While no specific distribution is assumed 
for the time until recidivism in nonparametric 
methods, parametric methods assume a certain 
type of distribution in which the hazard rates 
depend on time. One of the advantages of para- 
metric event history models lies in their capa- 
bility to directly model the time dependency 
exhibited in event history data, and this can be 
done by specifying a distribution function for 
the hazard rate (Box-Steffensmeier and Jones, 
2004). When there are strong theoretical expec- 
tations or previous empirical findings regarding 
the shape of the hazard rate, parametric mod- 
els would be most reasonable. If the researcher 
knows or suspects that the hazard rate increases 
or decreases over time, then one may specify 
a distribution that reflects such a relationship. 
By correctly specifying the shape of the hazard 
rate, the researcher can obtain better estimates 
of the time dependency in the data as well as 
more precise estimates of covariate parameters 
(Box-Steffensmeier and Jones, 2004, p. 21). 
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Of a variety of parametric models, the selec- 
tion of a particular parametric model depends 
primarily on the shape of the hazard rate func- 
tion [h(t) or log h(t)]. In the process of model 
selection, as Allison (1984, pp. 30-31) suggests, 
“first choice is between the exponential regres- 
sion model, in which there is no dependence 
on time, and all others.... If the exponen- 
tial model is rejected, one must then choose 
between monotonic models (in which the haz- 
ard always increases or always decreases with 
time) and nonmonotonic models (in which the 
hazard may sometimes increase and sometimes 
decrease).” 

The exponential model assumes that the base- 
line hazard rate is flat with respect to time. This 
means that the hazard rate is constant or the 
same at all points in time. We can express haz- 
ard rate for the exponential distribution as 

h(t)=A, (t>0,A>0) (1) 
where A is a positive constant. On the other 
hand, the baseline hazard of monotonic models 
such as the Weibull model can monotonically 
increase, monotonically decrease, or remain 
constant with respect to time. To show how the 
Weibull distribution can monotonically vary 
with time, the hazard rate for the Weibull model 
is given by 

h(t)=Ap(t)?"?,  (t>0,A>0,p >0) (2) 
where A is a positive constant and p is the 
shape parameter, which determines the shape 
of the hazard rate. When p>1, for example, 
the hazard rate monotonically increases with 
time; when p<1, the hazard rate monoton- 
ically decreases with time; when p=1, the 
hazard is a constant value A, which is flat 
(Box-Steffensmeier and Jones, 2004, p. 25). The 
exponential model thus can be considered a 
special case of the Weibull model. 

While the Weibull model permits the hazard 
to change in one direction with respect to time, 
the assumption of the monotonic hazard rate 


sounds somewhat unrealistic for many research 
questions in social science research. In many 
cases, there are good reasons to suspect that the 
hazard rate changes autonomously with time. 
In fact, most survival models in criminology 
have a nonmonotonic hazard rate, in particular 
a hazard rate that first rises and then falls. 

The log-normal and _ log-logistic models 
assume a nonmonotonic hazard distribution in 
which hazard first increases, reaches a peak, 
and then gradually declines (Allison, 1984; 
Schmidt and Witte, 1988; Yamaguchi, 1991; 
Joo et al., 1995; Box-Steffensmeier and Jones, 
2004). Since this is a pattern found in most 
of the recidivism data in criminology (see 
Figure 26.2), we rely on these two specifications 
for the multivariate prediction of survival time, 
which will follow in the next section. 

Log-normal and log-logistic models are spe- 
cific cases of a general class of models known as 
accelerated failure time models. If T is the mean 


Hazard rates 
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Figure 26.2 Hazard rates for the 1984-1987 
property cohorts 

Source: Joo et al. (1995). Recidivism among paroled 
property offenders released during a period of prison 
reform. Criminology, Vol.33 (No.3). 
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(survival) time until recidivism, these models 
may be written as 


Log T=a+b,x,+b,x, +b,x,;+---b,x, +e (3) 


where e is a random disturbance term that 
depends on time and explanatory variables. 
According to the differences in disturbance 
term e, each specific model is determined. Com- 
monly assumed distributions include extreme 
values, normal, and logistic distributions, 
which give rise to corresponding Weibull, log- 
normal, and log-logistic distributions for the 
mean time until recidivism (Allison, 1984, 
p- 30). 

As Blossfeld et al. (1989, p. 249) point out, 
“the log-logistic model, along with the log- 
normal distribution, are the most com- 
monly recommended distributions, if an initial 
increasing and then a decreasing risk is pre- 
sumed to exist.” In a recidivism study, which 
will follow in the next section as an example 
of empirical study utilizing parametric event 
history analysis, a comparison will be made 
between the two models to determine which 
one fits the data better (see Schmidt and Witte 
(1988) and Box-Steffensmeier and Jones (2004) 
for a more detailed discussion of log-normal 
and log-logistic models). 

The aim of the following study is to estimate 
an appropriate prediction model of the mean 
time until reincarceration to determine how the 
survival time depends on individual parolees’ 
characteristics, such as age at release, gender, 
race/ethnicity, risk assessment score, and prior 
incarceration offense. These models also allow 
us to estimate the effect of individual character- 
istics on the time until recidivism. For exam- 
ple, as Chung et al. (1991, p. 60) illustrate, “if 
any of the parolee characteristics is a dummy 
variable indicating participation or nonpartic- 
ipation in some correctional program,” using 
survival time model, we can estimate the effect 
of a certain correctional program on the length 
of time until recidivism by properly controlling 


for the other explanatory variables in the equa- 
tion. Because the data show a nonmonotonic 
pattern for the hazard rates, log-normal, and log- 
logistic models are computed and compared in 
order to estimate appropriate multivariate pre- 
diction models of the mean time until recidi- 
vism. The aim is to illustrate the estimation 
of how the survival time depends on parolees’ 
characteristics, with some attention to the iden- 
tification of factors related to the probability 
and the timing of parole failure. 


4 Multivariate prediction of 
survival time: an example of 
log-normal and log-logistic event 
history analysis 


4.1 Introduction 


With regard to prison overcrowding and ensu- 
ing judicial and political pressures on ever- 
worsening prison conditions, there has been 
increasing interest in finding ways to use the 
limited available prison space in the most effec- 
tive manner for controlling crime. Many stud- 
ies have indicated that a relatively small group 
of offenders are responsible for a dispropor- 
tionately high volume of criminal activity. In 
this respect, selective incapacitation, as a strat- 
egy for achieving crime control, has gained 
increased attention. Central to this policy is the 
problem of predicting an individual’s criminal- 
ity, as well as various ethical and policy issues 
posed by this prediction and resulting policy 
(Greenwood, 1982; Greenwood and Abrahamse, 
1982; Cohen, 1983; Blumstein, Cohen, Martin 
and Tonry, 1983; von Hirschi and Gottfredson, 
1983-84; Klein and Caggiano, 1986; Blumstein, 
Cohen, Roth and Visher, 1986; Tonry, 1987; 
Gottfredson and Tonry, 1987; Haapanen, 1990; 
Joo et al., 1995; Petersilia, 2003). 

Selective incapacitation involves “individu- 
ally based” sentences. Such sentences would 
vary with differences in predictions of the indi- 
vidual propensity to commit future crimes. 
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Therefore, the effectiveness of a selective inca- 
pacitation policy in reducing crime depends on 
the capability to identify those offenders who 
continue to threaten public safety. As Ekland- 
Olson etal. (1991, p. 101) also point out, “as states 
continue to rely on parole as a means of easing 
prison crowding and as political pressure accu- 
mulates regarding associated public safety risks, 
prediction of who is more or less likely to recidi- 
vate becomes increasingly important.” 

In event history analysis, as mentioned above, 
we can parameterize both the probability of 
eventual failure and the distribution of sur- 
vival times for those who will ultimately fail. 
In a previous study (Joo, 1993), we reconfirmed 
commonly found correlates of the recidivism 
by survival analysis (distribution or life-table 
method in particular). That is, the knowledge 
about an individual’s risk assessment score, 
race/ethnicity, age, and release offense were 
found to be very important factors in pre- 
dicting the probability of returning to prison 
among property offenders across cohorts. Some 
variations were also found in the probability 
of reincarceration across individual parolees’ 
characteristics. 

However, the prediction of parole outcome 
is not limited to success or failure. The trends 
in the hazard rates in Joo et al. (1995) indicate 
that there are important differences in the tim- 
ing of reincarceration across parolee character- 
istics, as well as across cohorts. The objective 
here is to construct and estimate an appropri- 
ate multivariate prediction model of the length 
of the time from release to return to prison (or 
survival time), with attention to how the fac- 
tors that explain parole outcome also are related 
to the timing of reincarceration among paroled 
property offenders. 

In our previous study (Joo, 1993), which 
examined both intra-cohort variation in rein- 
carceration and inter-cohort comparison of 
reincarceration rates, our focus was on the 
overall probabilities and the shifting patterns of 
reincarceration during this period of dramati- 


cally revised criminal justice policies. In addi- 
tion, we compared a cohort of inmates released 
under the Prison Management Act (PMA) with 
a comparable release cohort (i.e., 1987 prop- 
erty cohort) consisting of inmates who did not 
receive accelerated release under the provi- 
sion of the PMA. The aim of this comparison 
was to examine if PMA releasees differ from 
non-PMA releasees in recidivism pattern. The 
primary analytic strategy was to compare the 
three-year survival patterns of four successive 
parole cohorts through the use of the distribu- 
tion (or life-table) method. 

In multivariate prediction of survival time, 
however, our focus is placed on regression-like 
models in which the occurrence of an event 
depends on several explanatory variables, such 
as parolees’ characteristics and prior criminal 
history. We use these models to make predic- 
tions of the mean time until reincarceration, 
and to estimate the effect of individual charac- 
teristics on survival time. 


4.2 Analysis of survival time 


The survival time model (regression-like 
method here) focuses on those who returned to 
prison and examines the length of time from 
release to reincarceration. The aim is to esti- 
mate how the survival time depends on individ- 
ual parolees’ characteristics. They include age 
at release, gender, race/ethnicity, assessed risk, 
and prior incarceration offense. All of these 
explanatory variables are constant in value over 
the follow-up period. The dependent variable 
in this analysis is the length of time from release 
on parole to return to prison and includes cen- 
sored observations. 

To simplify the interpretation of the model, 
we recoded some of the explanatory vari- 
ables in the analysis. Prior incarceration offense 
(PIO) for paroled property offenders was col- 
lapsed into four property categories—burglary, 
larceny/theft, fraud/forgery, and motor vehicle 
theft. Age was divided into three categories— 
18 to 27, 28 to 37, and 38 and older. The 
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coding of gender, race/ethnicity (Anglo, African- 
American, and Latino) and risk assessment score 
(high, medium, and low) were not altered. 
Parolees who did not fail during the 36-month 
follow-up period are considered censored. Sim- 
ilar to dummy variable regression, each vari- 
able has a reference group. In the analyses that 
follow, the reference groups are: 38+, low risk, 
“fraud/forgery” prior offense, female,and Anglo. 

As mentioned above, the selection of a par- 
ticular parametric model depends primarily on 
the distribution of the hazard rates. In this con- 
nection, the log-normal and log-logistic models 
assume a nonmonotonic hazard distribution, 
specifically one that initially increases and then 
declines. Since this is the same pattern found 
in the hazard distribution of our data, we fit 
the two distributions to the data, by maximum 
likelihood method, to determine which one fits 
our data better. 


Variation in survival time among four 
successive cohorts 

Table 26.1 reports estimates for both log-normal 
and log-logistic models for the two early release 
cohorts: coefficient (b), anti-log of b, and the 
t ratio. The coefficient estimates (b) are like 
unstandardized regression coefficients in that 
they depend on the metric of each explana- 
tory variable. A negative coefficient indicates 
that an increase in the corresponding element 
in X, decreases the mean survival time until 
recidivism. Since the coefficients are in the log 
of length of time, the anti-logs have a more 
straightforward interpretation, i.e., the antilog 
of the coefficient represents the proportion of 
time from release to return compared to the 
reference group. For example, the antilog coef- 
ficient for high risk for the 1984 cohort (.65) 
indicates that high-risk parolees survived only 
65% of the length of time that low-risk parolees 
survived. The asymptotic t ratio for each coef- 
ficient is calculated by dividing the parame- 
ter estimate by its asymptotic standard error. 
Like standardized regression coefficients (B), 


these t statistics, under the null hypothesis that 
each coefficient is zero, are metric-free and give 
some indication of the relative importance of 
the explanatory variables. Also presented are 
the log-likelihood values and the significance 
of the variables in the models. 

The overall results from these two models for 
the 1984 cohort are very similar. The likelihood 
values are —775.6 for the log-normal model 
and —784.4 for the log-logistic model. These 
likelihood values indicate that the log-normal 
model fits the data slightly better than the log- 
logistic model. The results from the estimation 
of both log-normal and log-logistic regressions 
for the 1984 cohort indicate that release offense, 
race/ethnicity, and assessed risk have signifi- 
cant net effects on the survival time. 

The effect of prior incarceration offense is 
highly significant. Parolees for prior auto-theft 
conviction have a survival time that, all else 
equal, is 52% that of parolees for the reference 
group of forgery/fraud. Parolees with the prior 
incarceration offense of burglary have a survival 
time that is about 68% of parolees with the 
prior offense of forgery/fraud. The significant 
effects for motor-vehicle theft and burglary are 
not limited to the comparison of burglary and 
forgery/fraud, and auto theft and forgery/fraud. 
The difference between motor-vehicle theft and 
burglary is also significant. 

The effects of race/ethnicity confirm our ear- 
lier findings that Anglo parolees survive longer 
than minorities. Both African-American and 
Latino parolees released in 1984 had survival 
times that were approximately 78% and 72% 
of the survival time, respectively, experienced 
by Anglos. These results indicate that Latino 
parolees had the lowest expected survival times 
among paroled property offenders. 

As expected, high-risk parolees have a 
survival time that is 65% that of low-risk 
parolees. Those with medium-assessed risk 
have a survival time that is about 75% of 
that for the category of low risk. Although 
gender is not found as an important predictor 


Table 26.1 Estimates for log-normal and log-logistic models predicting the possibility of recidivism: 1984 and 1985 cohorts 


Characteristics 1984 cohort 1985 cohort 
Log-normal model Log-logistic model Log-normal model Log-logistic model 
b Anti-log t b Anti-log jt b Anti-log t b Anti-log t 

Constant 4.78 118.75 16.31 4.78 118.63 16.19 5.36 213.15 16.40 5.32 203.36 16.17 

Race/Ethnicity i = ™ 7 
African-American —.25 * 78 —2.36 —.28 * 76 —2.63 —.51™* 60 —4.34 —.52 * 60 —4.54 
Latino —.33 * 72 —2.37 —.35* 71 —2.52 —.37* 69 —2.53 —.39*™* 68 —2.73 
Anglo 

Gender ee * 
Male —.24 79 —1.20 —.27 .76 —1.33 —.66 * 52 —2.62 —.60* .55 2.40 
Female 

Age * 2K 
18-27 .07 1.08 0.48 09 1.09 0.57 —.23 80 —-1.42 —.26 63 —1.65 
28-37 .18 1.20 1.14 16 147 1.04 —.44* 65 —2.73 —.47*™* 81 —2.97 
38+ 

Risk score * . . ” 
High —.43 * 65 —2.31 —.42 * 66 —2.26 —.47 * 63 —2.55 —.47 ** 63 —2.59 
Medium —.29 * 75 —2.01 —.31* 74 —2.07 —.19 83 —1.31 —.21 81 —1.45 
Low 

Release offense ne +e 
Burglary —.39 * 68 —2.45 —.37* 69 —2.32 —.18 83 —1.09 —.17 84  —1.06 
Larceny/Theft —.17 85 —.97 —.14 87 —.78 i413 1.14 72 15. 117 85 
MVT —.66 ** 52 —-3.01 —.64 * 53 —3.00 —.33 72 —-1.41 —.34 71 —1.54 
Forgery/Fraud 

Scale 1.17 .67 1.10 62 

Log-likelihood —775.6 —784.4 —589.8 —594.3 


*Significant at .05 level. 
“Significant at .01 level. 
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of survival time in this cohort, males survive 
approximately 79% of the length of time female 
parolees survive. 

As mentioned above, the asymptotic t ratios 
are useful statistics for moderate to large sam- 
ples. If the ratio exceeds 2, the coefficient is 
significantly different from zero at the .05 level 
with a two-tailed test. Besides, the relative sizes 
of these ratios can be used to measure rel- 
ative importance of the variables, given that 
all predictors are in the model. In this cohort, 
we see that release offense, race/ethnicity, and 
risk score have significant effects on the time 
until recidivism. Particularly, prior incarcera- 
tion offense of motor-vehicle theft is the most 
significant predictor (—3.0) in predicting the 
length of time from release to reincarceration, 
followed by burglary (—2.45), Latino (—2.37), 
African-American (—2.35), and high risk (2.31). 
In this regard, the type of parolee most likely 
to have a small value of survival time until 
reincarceration is the Latino auto thief who is 
assessed as high risk. 

The models for the 1985 release cohort show 
a somewhat different pattern of effects from the 
1984 models. Overall, the log-normal model fits 
the 1985 data slightly better than log-logistic 
model. The likelihood values are —589.76 and 
—594.26 for the log-normal model and the 
log-logistic model, respectively. Race/ethnicity 
has the most significant effect on survival 
time (t= —4.34 for the log-normal model). 
African-American and Latino parolees survived 
only 60% and 69% of the length of time, 
respectively, that Anglo parolees survived. The 
differences between African-Americans and 
Latinos are also statistically different. Gender 
differences in return time are quite strong. 
Males have a survival time that averages about 
52% of that for female parolees. The gender 
effect in this model is considerably stronger 
than that reported for the previous cohort. 
Assessed risk also has a significant effect on 
return time. Parolees with high-assessed risk 
experienced a survival time that is only 63% 


of that for those with low risk. The comparable 
proportion for medium-risk releasees is 83%. 
The difference between high and medium risk 
is also significant. 

Finally, the effects of age are also 
significant—parolees aged 28 to 37 return more 
quickly compared to those 18 to 27 and 38 and 
over. Those aged 28 to 37 had a survival time 
that is only 65% of that for 38 and over. Parolees 
18 to 27 have also lower expected survival time 
compared to the oldest age group. However, 
there are no significant differences between the 
18-27 and 38+ age groups. For the 1985 cohort, 
the type of parolee most likely to return to 
prison shortly after release was the 28—37-year- 
old African-American male who was assessed 
as high risk. 

Table 26.2 also presents estimates for both 
log-normal and log-logistic models for the two 
later release cohorts. For the 1986 release 
cohort, the results from the log-normal model 
are also very similar to the corresponding 
results from the log-logistic model (the like- 
lihood values are —1044.5 for the log-normal 
model and —1055.3 for the log-logistic model). 
However, the patterns of effects for both mod- 
els are somewhat different from the previous 
models in the two previous cohorts. The three 
significant predictors for the 1986 cohort are 
risk assessment score, race/ethnicity, and age. 

Once again, the effects of risk are the strongest 
(t = —5.4 for high risk for both models). Those 
with high-assessed risk have a net survival 
time that is 43% of that for parolees with low- 
assessed risk. The group with medium-assessed 
risk also differs significantly from the reference 
category of low risk. Moreover, those with high 
risk are significantly different from those with 
medium risk. 

The effects of race/ethnicity are also very 
strong. The t ratios are —3.34 and —3.0 for 
Latino and African-American, respectively. 
As was the case for the 1984 cohort, Latino 
parolees showed shorter survival time until 
reincarceration compared to African-American 


Table 26.2 Estimates for log-normal and log-logistic models predicting the possibility of recidivism: 1986 and 1987 cohorts 


Characteristics 1986 cohort 1987 cohort 
Log-normal Log-logistic Log-normal Log-logistic 
b Anti-log t b Anti-log t b Anti-log t b Anti-log t 

Constant 4.86 129.15 18.31 4.83 125.09 18.54 4.86 129.02 23.26 4.82 123.72 23.43 

Race/Ethnicity = = 
African-American —.29 * 75 —3.00 —.32 * 73 —3.384 —.24** 79 —2.92 —.24** .79 —2.97 
Latino —.41 * 66 —3.34 —.45 * 64 —3.67 —.01 99 —.08 .00 1.00 —.04 
Anglo 

Gender . * 
Male —.24 79 —1.27 —.20 82 —-1.12 —.33 * 72 —2.33 —.33 * .72 —2.40 
Female 

Age * * 2K aR 
18-27 —.33 * 72 —2.32 —.34 * .71 —2.40 —.64 * 53 —4.92 —.65 * 52 —5.05 
28-37 —.12 89 80 —.12 .88 —.87 —.39 * 68 —2.98 —.38 * 68 —2.94 
38+ 

Risk score si ue = o 
High —.86 ** 43 —5.41 —.86 ** 43 —5.40 —.73 * 48 —5.42 —.76 ™* 47 —5.72 
Medium —.37 69 —2.63 —.36 * .70 —2.56 —.42 * 66 —3.79 —.42 * .66 —3.80 
Low 

Release offense 
Burglary —.23 79 —-1.54 —.23 79 —1.58 14 4.15 1.26 .18 1.19 L57 
Larceny/Theft —.07 .93 —.43 —.06 94 —41  .06 1.06 46.07 1.07 56 
MVT —.12 89 —.59 —.10 .90 —.50 —.12 88 —.79 —.08 .92 —.54 
Forgery/Fraud 

Scale 1.18 0.69 1.10 0.63 

Log-likelihood —1044.5 —1055.3 -1171.8 —1178.7 


*Significant at .05 level. 
“Significant at .01 level. 
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parolees. They have an expected length of time 
from release to reincarceration that is just under 
66% of that for the reference group of Anglos. 
African-American parolees (75%) also differ 
significantly from Anglos. However, the differ- 
ence between African-Americans and Latinos is 
not statistically significant. 

The parameter estimates for age indicate that 
parolees 18 to 27 have a significantly lower 
expected survival time compared to the oldest 
age group. That pattern also holds for the 28 to 
37-year-old age group. Finally, the 18 to 27 and 
28 to 37 age groups are significantly different as 
well. Males have an average survival time that 
is 79% of that for females. However, gender is 
not found to be an important predictor of sur- 
vival time in this cohort. Differences between 
males and females in the time until recidivism 
is also considerably smaller than that reported 
for the 1985 cohort. For the 1986 cohort, young 
Latino parolees with assessed high risk have 
the highest likelihood of quickly returning to 
prison after release. 

The pattern of effects for the 1987 cohort are 
almost identical to that for the 1985 model in 
that both cohorts have the same variables which 
estimated coefficient found to be significantly 
different from zero. Overall, the 1987 model 
is significant with a log-likelihood of —1172 
for log-normal model and 1179 for log-logistic 
model. 

High-risk parolees have a net survival time 
that is only 48% of that for those with low- 
assessed risk. Those with medium risk have 
a time from release to reincarceration that 
is 66% of the low-risk group. Finally, high- 
and medium-risk parolees are also statistically 
different. 

Age differences in return time are quite 
strong. Parolees aged 18 to 27 have substan- 
tially lower expected survival time than other 
age groups. Parolees 18 to 27 have also signif- 
icantly lower survival time than the oldest age 
group. The difference between 18 to 27 and 28 
to 37 is also significant. 


With regard to race/ethnicity, African- 
American parolees have significantly lower sur- 
vival times compared to Latinos and Anglos. 
However, Latinos and Anglos are essentially the 
same with regard to length of time from release 
to return. The gender effect is important for this 
cohort. Males have a significantly lower aver- 
age survival time compared to females (69%). 
As has been the case with the two previous 
(1984 and 1985) cohorts, prior incarceration 
offense in the 1987 cohort has a trivial effect 
on return time, net of the other variables in the 
model. The type of parolee in the 1987 cohort 
most likely to return quickly to prison is a 
young African-American male with the assess- 
ment score of high risk. 

In sum, the effects of risk are the strongest 
and the most consistent across the cohorts. We 
also observed consistently significant effects 
of race/ethnicity and age. With regard to 
race/ethnicity, the one persistent effect is that 
between Anglos and minorities. In two cases, 
however, the differences between Latinos and 
African-Americans are also statistically sig- 
nificant. When significant, the effects of age 
were rather strong, indicating that not only 
do younger parolees have a greater likeli- 
hood of failure but also a shorter survival 
time after release. Unlike the violent offenders 
whose criminal inclination is declining after 
their late-20s or 30s, paroled property offend- 
ers aged 28-37 showed still relatively short 
mean time until recidivism. Prior incarcera- 
tion offense is found to have a trivial net 
effect. One likely explanation for this result 
is that other characteristics, such as age and 
risk, are associated with prior offense and once 
these factors are partialled out, offense has no 
effect. 

Following our previous study (Ekland-Olson 
et al., 1991; Joo, 1993), we compute predicted 
values for survival time since this is a use- 
ful way to summarize the differences across 
the four models. Based on the estimates of t 
ratios for the log-normal model, we selected 
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the characteristics in each model that are 
statistically significant but also that are asso- 
ciated with the lowest survival times. Taking 
a worst case scenario, the prediction equations 
for the four cohorts consist of the following 
characteristics: 


1984—auto thief, high risk, Latino 
1985—28-37, highrisk, African-American, male 
1986—18-27, high risk, Latino 

1987—18-27, highrisk, African-American, male 


To obtain predicted values for survival time, 
we multiply the intercepts by the anti-logs of 
the coefficients associated with the selected 
characteristics. The following equations were 
used to generate the predicted survival time (S) 
in months: 


1984 cohort: S=119 * .52 * .65 * .72=29.0 
months 

1985 cohort: S = 213 * .65 * .63 * .60* .52=27.2 
months 

1986 cohort: S=129 * .72 * .43 * .66 = 26.4 
months 

1987 cohort: S=129* .53* .48 * .79* .72=18.7 
months 


As indicated in Table 26.1 and Table 26.2, 
risk assessment score has the strongest impact 
on survival time. For the 1986 and 1987 cohorts, 
this conclusion is also based on a comparison 
of the size of the coefficient for high risk (e.g., 
anti-log of .43 and .48 for the 1986 and 1987 
cohorts, respectively). 

A useful way to illustrate the impact of risk 
is to generate predicted survival times by sub- 
stituting the coefficient for medium risk for the 
coefficient for high risk, leaving all of the other 
coefficients unchanged. The difference in pre- 
dicted values illustrates the impact of high risk 
on expected survival times. The following equa- 
tions were used to generate these revised pre- 
dicted values. 


1984 cohort: S=119 * .52 * .75 * .72=33.4 
months 

1985 cohort: S= 213 * .65 * .83 * .60* .52=35.9 
months 

1986 cohort: S=129 * .72 * .69 * 66=42.3 
months 

1987 cohort: S=129* .53* .66*.79* .72=25.7 
months 


Changing from high risk to the modal category 
of medium risk has a substantial impact on sur- 
vival time. This change produces the following 
excess number of predicted survival months: 
4.5, 8.6, 15.9, and 7.0 months for the 1984, 1985, 
1986, and 1987 cohorts respectively. 

As indicated, the impact of high risk on the 
predicted time until recidivism is most evident 
for the 1986 cohort. Although the predicted sur- 
vival time for the 1986 cohort is larger than 
expected due to the differences in the number of 
significant variables across cohorts, this result 
still supports our early finding on the compo- 
sitional difference in high-risk category in the 
1986 cohort. 


Variation in survival time between PMA 

and 1987 cohorts 

Table 26.3 presents parameter estimates for 
the PMA cohort, as compared to the 1987 
non-PMA cohort, for both log-normal and log- 
logistic models. While the overall results from 
these two models for the PMA cohort are very 
similar, unlike the cases for four successive 
cohorts, the log-likelihood values indicate that 
the log-logistic model (—1089.8) fits the PMA 
data slightly better than the log-normal model 
(—1109.9). For the 1987 non-PMA cohort, the 
log-likelihood value from the latter (—1171.8) is 
a little higher than the former (—1178.7), how- 
ever both models are very similar. For both 
cohorts, therefore, we use parameter estimates 
from the log-logistic model to calculate the sur- 
vival time. 


Table 26.3 Estimates for log-normal and log-logistic models predicting the possibility of recidivism: PMA and 1987 cohorts 


Characteristics PMA cohort 1987 cohort 
Log-normal Log-logistic Log-normal Log-logistic 
b Anti-log t b Anti-log [i b Anti-log it b Anti-log t 

Constant 5.54 254.68 16.79 5.27 194.42 17.89 4.86 129.02 23.26 4.82 123.72 23.43 

Race/Ethnicity “ = a ii 
African-American —.46 ** .63 —3.60  —.40** 67 —3.57 —.24* 79 —2.92 —.24 ** .79 —2.97 
Latino O01 1.01 0.07 .00 1.00 01 —.01 99 —.08 .00 1.00 —.04 
Anglo 

Gender . + * + 
Male —.52* 59 —2.30 —.44* 64 —2.18 —.33* 72 —2133. —333* 72 —2.40 
Female 

Age 7K 2K 2K eK 
18-27 —.58** 56 —2.73 —.51** .60 —2.69 —.64** 53 —4.92 —.65** 52 —5.05 
28-37 —.67 * 51 —3.12 —.59 * 55 —3.06 —.39* 68 —2.98 —.38 ** 68 —2.94 
38+ 

Risk score oe oe a a 
High —1.20** 30 —5.79 —1.14* 32 -6.23 —.73 48 —5.42 —.76** 47 —5.72 
Medium —.39* 68 —2.24 —.43 ** 65 —2.73 —.42* 66 —3.79 —.42** .66 —3.80 
Low 

Release offense 
Burglary —.28 76 —1.61 —.22 80 —1.41 14 L.a5 1.26 18 1.19 Lo? 
Larceny/Theft —.37 * 69 —2.02 —.23 79 —1.42 .06 1.06 46 .O7 1.07 .56 
MVT —.23 .79 —1.09 —.18 84 —.94 —.12 88 —.79 —.08 92 —.54 
Forgery/Fraud 

Scale 1.50 0.78 1.10 0.63 

Log-likelihood —1109.9 —1089.8 —1171.8 —1178.7 


*Significant at .05 level. 
“Significant at .01 level. 


G€Ep UISIATPI9eI Jo stsATeuR 9q} 0} uoreorddp BB. ABA TRHE/) SORGL RaeHS, GERS IEEE d 


436 Handbook of LongitudihdP Rega: https:/afrilibrary.com 


The results for the PMA cohort are identi- 
cal to the results for the non-PMA cohort in 
terms of statistically significant variables. Both 
log-normal and log-logistic regressions for the 
PMA cohort indicate that risk assessment score, 
race/ethnicity, age, and gender have signifi- 
cant net effects on the survival time. However, 
noteworthy in this connection is that there are 
important differences in terms of the magni- 
tude of the effects. Recall that the interpreta- 
tion of the anti-log (exponentiated coefficients) 
is in terms of the proportion of survival time 
for a particular group compared to the reference 
group. 

The most dramatic difference between the 
PMA and 1987 cohorts is with regard to 
assessed risk. High-risk PMA releasees have, all 
else being equal, a survival time that is only 
32% of that of low-risk PMA parolees. The com- 
parable figure for the 1987 group is 47%. The 
effect of race/ethnicity is somewhat stronger 
for PMA releasees as is the effect of gen- 
der. African-American and male PMA releasees 
have a lower expected survival time compared 
to their non-PMA counterparts. While PMA 
releasees aged 28-37 showed lower expected 
survival time than the 18—27 group, for the 1987 
cohort, the youngest age group exhibited the 
lowest survival time. 

Once again, as a useful way to summarize 
the differences between these two cohorts, we 
estimate the expected (or mean) survival time 
for parolees with characteristics that are sta- 
tistically significant but also are associated 
with a high likelihood of return (i.e., for PMA 
cohort, high risk, male, and African-American, 
and being 28 to 37 years of age). The predic- 
tion equation for the 1987 cohort consists of 
the following characteristics: high risk, male, 
and African-American, and being 18 to 27 
years of age. The predicted mean survival time 
(mean time from release to return) is obtained 
by adjusting the intercept by multiplying it 
times the coefficient for each of these char- 
acteristics. The following equations were used 


to generate the predicted survival time (S) in 
months: 


PMA cohort: S=194 * .32 (high) * .64 (male) * 
.67 (African-American) * .55 (28-37)=14.6 
months 

1987 cohort: S=124 * .47 (high) * .72 (male) * 
.79 (African-American) * .52 (18-27)=17.2 
months 


While the expected average time from release to 
return for PMA parolees with these character- 
istics is 14.6 months, a counterpart non-PMA 
1987 parolee has an average survival time of 
17.2 months. The difference in the predicted 
survival time reflects the notable difference 
across these two cohorts in not only the level 
but also the timing of return. 

Once again, in an attempt to estimate the 
impact of risk on expected survival times, we 
substitute the coefficient for medium risk for 
the coefficient for high risk, leaving all of the 
other coefficients unchanged. The following 
equations were used to generate these revised 
predicted values. 


PMA cohort: S=194* .65 (medium) *.64 
(male) *.67 (African-American) *.55 (28-37) = 
29.7 months 

1987 cohort: S=124* .66 (medium) *.72 
(male) *.79 (African-American) *.52 (18-27) = 
24.2 months 


Changing from high risk to medium risk has 
a substantial impact on survival time. This 
change produces 15.1 and 7.0 excess num- 
ber of predicted survival months for the PMA 
and 1987 cohorts respectively. These differ- 
ences in predicted values illustrate the impact 
of high risk on expected survival times. As indi- 
cated, the impact of high risk on the predicted 
time until recidivism is more evident for the 
PMA cohort, which supports our finding on the 
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compositional difference in high-risk category 
in that cohort. 


Summary of the empirical findings 

On the whole, we have found that the factors 
thought to predict parole outcome based on 
our earlier descriptive results also are related 
to the timing of parole failure and are con- 
sistent with the results from descriptive anal- 
yses. The salience of several factors such as 
assessed risk, race/ethnicity, age, prior incar- 
ceration offense, and gender are reconfirmed in 
predicting differences in the timing of parole 
failure as well as parole outcome. The effects 
of risk are the strongest and the most con- 
sistent across the cohorts. We also observed 
consistently significant effects of race/ethnicity 
and age. With regard to race/ethnicity, the one 
persistent effect is that between Anglos and 
minorities. When significant, the effects of age 
were rather strong. Unlike the violent offend- 
ers whose criminal inclination is declining after 
their late 20s or 30s, however, paroled property 
offenders aged 28-37 still showed a relatively 
short mean time until recidivism. 

In addition, the pattern of predicted values 
is somewhat consistent with our preliminary 
conclusions established with regard to trends 
in survival probabilities and the hazard rates 
in the previous two sections. The early cohorts 
(1984 and 1985) have very similar predicted 
survival times. The 1987 cohort has a predicted 
time that is between eight and ten months 
shorter than the two early cohorts. On the other 
hand, the 1986 cohort has an estimated sur- 
vival time only one to three months shorter 
than the prior two cohorts, which may be partly 
due to the differences in the number of sig- 
nificant variables in the estimation equations 
across cohorts (e.g., three variables for 1984 and 
1986 cohorts, and four variables for the 1985 
and 1987 cohorts). 

In addition, we compared PMA releasees 
with the 1987 non-PMA group to examine the 
possible impact of a legislative change, i.e., 


accelerated release under the Prison Manage- 
ment Act, on the predicted length of time from 
release to reincarceration. The results for the 
PMA cohort are identical to the results for 
the non-PMA cohort in terms of statistically 
significant variables—risk assessment score, 
race/ethnicity, age, and gender have significant 
net effects on the survival time. However, there 
are important differences in terms of the mag- 
nitude of the effects. This finding is more evi- 
dence that indicates a possible reduction in the 
deterrent influence on paroled property offend- 
ers who experienced accelerated early release 
under PMA. 

In this regard, we estimate the expected (or 
mean) survival time for parolees with character- 
istics that are associated with a high likelihood 
of return. While the expected average survival 
time for PMA parolees with these character- 
istics is 14.6 months, a counterpart non-PMA 
1987 parolee has an average survival time of 
17.2 months. The difference in the predicted 
survival time between these two cohorts reflects 
differences not only in the level but also the 
timing of return. 

As suggested in previous studies (Ekland- 
Olson et al., 1991, pp. 126-128; Joo, 1993), 
this finding suggests that factors in each cohort 
which maximize the level and the timing of 
return to prison changed over time. This is 
in part due to the compositional differences 
between two groups, but also partly due to 
administrative changes in the policy of parole 
revocations as well as legislative changes (e.g., 
PMA) that may have lessened the deterrent 
effect of incarceration. While this analysis does 
not allow us to measure the relative impor- 
tance of the factors, it is clear that composi- 
tional differences play an important role, as 
is evidenced by the substitution of the coeffi- 
cient for medium risk for the coefficient for high 
risk. It is also noteworthy that the differences 
between PMA and non-PMA cohorts in the pre- 
dicted survival time, as well as in the level and 
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the timing of return, are also consistent with a 
reduced deterrent effect. 

The analysis of survival time provides impor- 
tant information about the pattern of return, 
which allows policymakers to identify parolee 
characteristics that are related to the timing of 
reincarceration. Besides, they can use this anal- 
ysis to estimate the effect of a particular indi- 
vidual characteristic or a certain correctional 
program on the survival time, controlling for 
the other variables. 
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| Chapter 27 J 


Discrete-time survival analysis: 
predicting whether, and if so when, 
an event occurs 


Margaret K. Keiley, Nina C. Martin, Janet Canino, 
Judith D. Singer and John B. Willett 


1 Introduction 


Researchers often ask whether and, if so, when, 
critical events in the life course occur. These 
questions are often difficult to address because 
of a problematic information shortfall, known 
as censoring, that often occurs when not every- 
one in the sample experiences the target event 
during the period for which the data were 
collected. In this chapter, we show how the 
methods of discrete-time survival analysis (aka 
event history analysis and hazard modeling) 
are ideal for studying event occurrence because 
they allow the even-handed incorporation of 
data from both the noncensored and censored 
cases alike. We use retrospective longitudinal 
data on the ages at which adolescents self-report 
that they had their first experience of sexual 
intercourse (the target event) to introduce fun- 
damental statistical quantities, the hazard and 
survival probability. Then, we generate, spec- 
ify, explain, fit, and interpret formal discrete- 
time hazard models of the relation between the 
risk of event occurrence and critical predic- 
tors, including predictors that describe the pas- 
sage of time itself. Finally, we describe how 


researchers who pose research questions about 
whether and when but who choose not to adopt 
a survival-analytic framework can easily be led 
astray by traditional statistical analysis. 

An important class of research question often 
posed in the social sciences asks “whether” 
and, if so, “when” a target event occurs. 
Researchers investigating the consequences of 
childhood traumas on later well-being, for 
instance, ask whether an individual ever expe- 
riences depression and, if so, when onset 
first occurs (Wheaton, Roszell and _ Hall, 
1997). Other researchers ask questions about 
whether and when street children return to 
their homes (Hagan and McCarthy, 1997), 
whether and when college students drop out 
of school (DesJardins, Ahlburg and McCall, 
1999), whether and when recently married cou- 
ples get divorced (South, 2001) and whether 
and when adolescent boys (Capaldi, Crosby 
and Stoolmiller, 1996) or university students 
(Canino, 2002; 2005) experience sexual inter- 
course for the first time. 

Familiar statistical techniques, such as multi- 
ple regression analysis and analysis of variance, 
and even their more sophisticated cousins, such 
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as structural equation modeling, are ill-suited 
for addressing questions about the occurrence 
and timing of events. These usually versa- 
tile methods fail because they are unable to 
handle situations in which the value of the 
outcome—i.e., whether and when an event 
occurs—is unknown for some of the people 
under study. When event occurrence is being 
investigated, however, this type of informa- 
tion shortfall is commonplace if not inevitable. 
No matter how long a researcher is funded to 
collect data, some people in the sample may 
not experience the target event while they are 
being observed—some adults will not become 
depressed, some street children will not return 
to their parental homes, some college students 
will not drop out of school, some recently mar- 
ried couples will not divorce, some adolescents 
will remain virgins. Statisticians say that such 
cases are censored. 

Censoring creates an important analytic 
dilemma that cannot be ignored. Although the 
investigator knows something important about 
the individuals with censored event times—if 
they do ever experience the event, they will 
do so after the observation period (the period 
for which the data were collected) ends—this 
knowledge is imprecise. If a university student 
does not experience sexual intercourse by age 
21, for example, we would not want to conclude 
that he or she will never do so. All we can say 
is that by age 21, he or she was still a virgin. 
Yet, the need to incorporate data from individu- 
als with censored and noncensored event times 
simultaneously into respectable data analyses is 
clear because the censored individuals are not 
a random subgroup of the sample. They are a 
special group of people—the ones who are Jeast 
likely to experience the event, the ones who are 
the “longest-lived” participants in the sample. 
Consequently, they provide considerable infor- 
mation about the potential rarity of target event 
occurrence. Credible investigation of event 
occurrence requires a data-analytic method that 
deals evenhandedly with both the noncensored 


and the censored observations. Biostatisticians 
modeling human lifetimes to the event of death 
were initially stimulated to develop a class 
of appropriate statistical methods for analyz- 
ing such data because they faced the censoring 
problem constantly in medical research where, 
often (and thankfully), substantial numbers of 
their study participants did not die by the end of 
the observation period (Cox, 1972; Kalbfleisch 
and Prentice, 1980). Despite the foreboding 
appellations of the techniques that were thus 
developed — they became known variously as 
survival analysis, event history analysis, and 
hazard modeling—these techniques have now 
become invaluable to social scientists outside 
the medical field because they provide a sound 
and reasonable statistical basis for exploring the 
“whether” and “when” of all kinds of interest- 
ing target events in the lives of participants. 

In this chapter, we provide a conceptual 
introduction to these survival methods, focus- 
ing specifically on the principles of discrete- 
time survival analysis. After distinguishing 
between discrete-time and continuous-time sur- 
vival analysis and explaining why we encour- 
age first-time learners to begin with the former 
approach, we use an example of retrospective 
longitudinal event-history data on the age at 
first sexual intercourse for a sample of college- 
going adolescents to introduce the fundamen- 
tal building blocks of these methods. These 
building blocks are known as the hazard and 
survival probabilities, and they offer two com- 
plementary ways of describing patterns in the 
risk of event occurrence over time. We then 
introduce and specify statistical models that 
can be used to link these temporal patterns of 
risk to selected predictors, including time itself, 
and we comment on the types of predictors that 
can easily be included in these models. We then 
show how discrete-time hazard models can be 
fitted to longitudinal data and model parame- 
ters estimated, tested, and interpreted. Finally, 
we comment on how easily researchers can be 
misled if they resort to traditional data-analytic 
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techniques for addressing these same ques- 
tions, instead of adopting survival methods. 
Our presentation is intended to be conceptual 
and nontechnical-readers who are interested 
in learning more about these methods should 
consult Singer and Willett (2003) for additional 
guidance. 


2 Measuring time and recording 
event occurrence 


Before event occurrence can be investigated, a 
researcher must first record how long it takes, 
from some agreed-upon starting point, for each 
individual in a sample to experience the tar- 
get event, or be censored. The researcher must 
thus be able to define a “beginning of time” in 
some meaningful and unambiguous way, must 
establish a suitable metric in which the passage 
of time can be recorded, and must irrevocably 
recognize the target event when it occurs. We 
comment on each of these three points briefly 
below. 

Depending on the particular research project 
in question, investigators possess a great deal 
of flexibility in identifying the “beginning of 
time.” Often, because physical birth is both 
handy and meaningful across a wide variety of 
substantive contexts, many researchers choose 
it as the “beginning of time” and consequently 
use an individual’s age (i.e., time since birth) as 
the metric in which time is measured (see, e.g., 
Wheaton et al., 1997). But researchers need not 
restrict themselves to birth and to the metric 
of chronological age. One way of establishing 
a beginning of time in a particular study is to 
tie it to the occurrence of some other precip- 
itating event—one that places all individuals 
in the population at risk of experiencing the 
forthcoming target event. When modeling how 
long it takes before street children return to 
their parental home, for example, the “begin- 
ning of time” is naturally defined as the time at 
which the prospective street children left their 
parental home for the streets for the first time 


(thereby making “time on the street” the metric 
for the ensuing survival analysis). 

Once a common start time has been defined, 
the researcher must observe participants— 
either prospectively as time passes, or using ret- 
rospective reconstruction of the event history— 
to record whether and, if so, when, the tar- 
get event occurs to each. All participants who 
experience the target event during the obser- 
vation period are then assigned an event time 
equal to the time at which they actually experi- 
enced the event. Individuals who do not expe- 
rience the target event during the window of 
the observation period are assigned censored 
event times, set equal to the time at which the 
observation period ended or when the individ- 
ual was no longer at risk of experiencing the 
event, but labeled as “censored” to indicate that 
the target event had not occurred by that time. 
These censored event times, although seem- 
ingly inconclusive, tell us a great deal about the 
distribution of event occurrence because they 
establish that participants did not experience 
the target event at any time up to and including 
their time of censoring. 

In some research, investigators can record 
event occurrence very precisely indeed. When 
studying the relationship between experiences 
of childhood adversity and subsequent death, 
for example, Friedman, Tucker, Schwartz and 
Tomlinson-Keasey (1995) used public records 
to determine the precise time—in years, 
months, and even days—when each individ- 
ual who had died had actually passed away. 
Other researchers can record only that the target 
event occurred within some discrete-time inter- 
val. A researcher might know, for example, the 
year that a person first experienced depressive 
symptoms or first had sexual intercourse, the 
month when an individual began a new job, or 
the grade when a youngster transitioned from 
adult-supervised care to self-care. We distin- 
guish between these two scales of measurement 
by calling the former continuous-time data and 
the latter discrete-time data. 
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In this chapter, we focus on statistical meth- 
ods for analyzing event occurrence recorded 
in discrete time. We have several reasons 
for our emphasis. First, we have found that 
discrete-time methods are intuitively more 
comprehensible than their continuous-time 
cousins, facilitating initial mastery and subse- 
quent transition to continuous-time methods. 
Second, we believe that discrete-time methods 
are highly appropriate for much of the event 
history data being collected naturally by social 
scientists because, for logistical and financial 
reasons, these data are often recorded only in 
terms of discrete intervals (see Lin, Ensel and 
Lai, 1997). Third, the discrete-time approach 
facilitates inclusion of both time-invariant 
and time-varying predictors, whereas inclu- 
sion of the latter is more difficult under 
the continuous-time approach. Thus, with 
discrete-time survival analysis, researchers can 
easily examine the impact of predictors, such 
as family structure and employment status, 
whose values fluctuate naturally over the life 
course. Fourth, discrete-time survival analysis 
explicitly fosters inspection of how patterns in 
the risk of event occurrence unfold over time, 
whereas the most popular continuous-time 
survival-analytic strategy (“Cox regression”; 
Cox, 1972) ignores the shape of the temporal 
risk profile entirely in favor of estimating the 
influence of predictors on that profile, under 
a restrictive assumption of “proportionality.” 
Fifth, under the discrete-time approach, the 
proportionality assumption is easily assessed 
and “nonproportional” models specified, fitted, 
and interpreted. Finally, in discrete-time sur- 
vival analysis, all model-fitting and parameter 
estimation can be conducted using standard 
statistical software that has been designed 
for standard logistic regression analysis. The 
researcher can thus avoid reliance on the 
dedicated computer software required for 
continuous-time survival analyses and employ 
good practices of sensible data analysis, as all 
of the investigator’s usual analytic skills can 


be brought to bear. In our case, we used the 
Statistical Analysis System, PC-SAS, Version 
9.1 (SAS Institute, 2005) to fit our hypothesized 
discrete-time hazard models. 


3 Descriptive analysis of 
discrete-time survival data 


The hazard probability and the survival prob- 
ability are the two fundamental quantities at 
the center of all discrete-time survival anal- 
ysis. Estimates of these probabilities provide 
answers to the two key questions that are usu- 
ally asked of event history data: “When is the 
target event most likely to occur?” and “How 
much time passes before people are likely to 
experience the event?” respectively. In what fol- 
lows, we introduce these two quantities and 
illustrate how they can be estimated and inter- 
preted to address such questions. 

In our explanation, we make use of an exam- 
ple of retrospective data gathered on a sample 
of 618 university students from a large, mid- 
western public university. These students were 
invited to earn extra classroom credit by com- 
pleting a confidential online web survey (made 
available online from August 30—October 31, 
2004) about their sexual history, level of reli- 
giosity, the quality of their romantic attachment 
style, and selected demographic information. 
The sample included 444 women (72%) and 
174 men (28%). Most respondents (87%) had 
at least a high-school-level education. Eighty- 
six percent of the participants reported their 
race as Caucasian and the remaining 13% of 
respondents were African-American (5%), His- 
panic (3%), Asian or Pacific Islander (3%), or 
other (2%). The majority of the respondents 
(86%) were single, 9% were engaged to be mar- 
ried, 4% were married, and 1% were divorced 
(Canino, 2005). 


3.1 Person-period dataset 


An important precursor to any kind of data 
analysis is to establish a sensible format for 
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storing the data that you intend to analyze. 
To conduct discrete-time survival analyses, you 
must assemble your data on event occurrence 
in a person-period format in which each per- 
son contributes one record (row) to the dataset 
for each discrete-time period in which he/she 
is at risk for event occurrence. In Table 27.1, 
we provide an example of the person-period 
format using data on two participants from our 
“first experience of sexual intercourse” dataset. 
In this dataset, time is divided into discrete 
yearly “bins,” corresponding to ages 13 through 
21, none of the participants having experienced 
the target event at age 12 or earlier. Each par- 
ticipant then contributes rows to the dataset, 
corresponding to the years in which he or she is 
at risk of first experiencing sexual intercourse. 
For instance, notice that person #4 contributes 
three rows to the dataset, for ages 13 through 15, 
and person #151 contributes nine rows, for ages 
13 through 21. Other participants contributed 
in a similar fashion, the number of rows that 
each added being determined by their sexual 
history, or the occurrence of censoring. 
Beyond the first two columns, each sub- 
sequent column of the dataset then contains 


values of several classes of variables that we 
incorporate into subsequent analyses. These 
variables record important features of the prob- 
lem under investigation, including: (a) the pas- 
sage of time, (b) the occurrence of the target 
event (or the occasion of censoring), and (c) the 
values of important predictors that ultimately 
become the centerpiece of our discrete-time sur- 
vival analysis. We discuss each of these three 
classes of variable, briefly, below. 

Ultimately, we will treat participant age as 
a predictor in a forthcoming discrete-time sur- 
vival analysis to investigate how participants’ 
risks of initial sexual experience differ with 
age. And, as you will see, it will prove con- 
venient at that point to represent participant 
age in its most general specification—as a sys- 
tem of dummy variables. Thus, in columns #3 
through #11 of the person-period dataset, we 
use an alternative specification to denote partic- 
ipants’ ages in each discrete-time period. Rather 
than record the value of the participant’s age 
as a (continuous) yearly value, as in column 
#2, we have created a system of dichotomous 
time predictors, labeled A13 through A21. The 
values of these dichotomies are set to indicate 


Table 27.1 Records (rows) that a pair of adolescents (#4 and #151) contribute to the complete person-period 
dataset containing longitudinal discrete-time information on the occurrence and timing of first sexual 
intercourse as a function of self-reported attachment style and religiosity 
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the participant age to which each discrete- 
time period pertains. In the row correspond- 
ing to age 13, for instance, dummy variable 
A13 takes on the value of 1 and all other time- 
indicators, A14 through A21, are set to zero. At 
age 14, dummy variable A74 is set to 1, with 
A13 and A15 through A21 assuming the value 
zero, and so on. Later, we will choose to use 
this set of time-dummies as initial predictors 
in discrete-time hazard modeling to establish 
the age-profile of the risk of first sexual inter- 
course. Here, though, we simply emphasize that 
discrete-time survival analysis does not require 
that you specify the time predictor in this very 
general way as a set of dummy predictors— 
there would be nothing to prevent you from 
using participants’ (linear) AGE as a predictor 
in subsequent survival analysis, for instance. 
However, in our empirical research, we have 
found that the general specification of time 
established here, and illustrated in Table 27.1, 
usually provides the most successful starting 
point for discrete-time survival analysis, given 
the typically irregular nature of any risk profile 
with age. For this reason, we always recom- 
mend that you establish a general specification 
for the time predictor up-front, when you first 
set up your person-period dataset. 

In the twelfth column of our person-period 
dataset in Table 27.1, we include the all- 
important “event” indicator, which we have 
labeled EVENT. This variable will ultimately 
serve as the outcome variable in our discrete- 
time survival analyses of the relationship 
between risk of first sexual experience and pre- 
dictors. EVENT is also a dichotomous variable, 
coded so that it takes on the value 0 at each 
age in which the respondent did not experi- 
ence the target event and 1 at the single age at 
which the event was experienced. A key fea- 
ture of the person-period dataset is that, for 
each participant, once the event indicator has 
been coded 1 (and the target event has therefore 
occurred), no additional records are included 
in the person-period dataset for that individual. 


An individual who experiences the event of 
interest—in this case, first experience of sexual 
intercourse—is no longer at risk of subsequent 
initiation, by definition, and therefore drops out 
of the risk set for this event. In our example, 
person #4 experienced sexual intercourse at the 
age of 15, thus EVENT takes on the value “0” 
in each of the time periods prior to age 15, 
but switches to value “1” in the discrete-time 
bin corresponding to age 15. Then, once per- 
son #4 has experienced the target event, he is 
no longer at risk of experiencing sexual inter- 
course for the first time ever again, thus he con- 
tributes no further records to the person-period 
dataset. By contrast, person #151 had not expe- 
rienced sexual intercourse for the first time by 
age 21. For her, EVENT is coded “0” in all nine 
discrete-time periods in the dataset, from ages 
13 through 21, and its value never switches 
from 0 to 1, meaning that the study ended with- 
out her experiencing the target event. She is 
therefore censored at age 21. 

Finally, the person-period dataset also 
contains the values of predictors whose rela- 
tionship with the risk of first sexual experi- 
ence is under investigation. In our example, we 
have chosen to work with two important time- 
invariant predictors, representing the romantic 
attachment style and religiosity of the par- 
ticipants, although it would have been easy 
to include many others. Romantic attachment 
style is a categorical predictor describing 
three distinct states of romantic attachment, 
labeled “secure,” “avoidant,” and “preoccu- 
pied” (based on Brennan, Clark and Shaver’s 
[1998] experiences in close relationships [ECR] 
measure, which assesses adult and adolescent 
romantic relationship attachment styles). Of the 
618 adolescents in our sample, 214 were catego- 
rized with a secure attachment style, 201 with 
a preoccupied style, and 203 with an avoidant 
style. In our analyses, in order to include attach- 
ment as a predictor in subsequent discrete-time 
survival analyses, we created three dummy pre- 
dictors to represent each of the three attachment 
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styles, but included only AVOID and PREOCC 
in our dataset and models, omitting SECURE as 
the reference category. The values of these two 
predictors are listed for adolescents #4 and #151 
in columns 13 and 14 of Table 27.1; because we 
consider attachment style time-invariant in our 
analyses, the values of AVOID and PREOCC are 
identical over all discrete-time periods, within- 
individual. Person #4 has a secure romantic 
attachment style, as both AVOID and PREOCC 
are coded zero, for him; person #151 has an 
avoidant attachment style. 

Our second major predictor is the adoles- 
cent’s religiosity, represented by the time- 
invariant continuous variable, RELIG, which 
was self-reported on a five-item scale measur- 
ing the person’s degree of religious commitment 
with the Duke Religion Index (DUREL; Koenig, 
Patterson and Meador, 1997). In Table 27.1, 
notice that adolescent #4 self-reports a higher 
level of religiosity than adolescent #151 (val- 
ues of 18 versus 12), but that the values of the 
variable are again time-invariant in both cases. 
Although we do not illustrate it here, inspec- 
tion of the person-period format in Table 27.1 
should easily convince you that it is a small step 
to the inclusion of additional time-varying pre- 
dictors in the dataset, and consequently in any 
subsequent discrete-time hazard models. The 
values of time-varying predictors would simply 
differ from row to row across the person-period 
dataset, within-person. 


3.2 Hazard probability 


Once the discrete-time event history data have 
been formatted and recorded appropriately in a 
person-period dataset you can begin to inves- 
tigate the occurrence and timing of the tar- 
get event by addressing the “Whether?” and 
“When?” questions to which we have alluded 
in our introduction. However, in discrete-time 
survival analysis, and because of the ubiqui- 
tous presence of censoring, we do not attempt to 
summarize and analyze time-to-event directly. 
Instead, when investigating the occurrence of 


a target event—like the event of “first expe- 
rience of sexual intercourse” in our sample 
of adolescents—we begin by figuring out how 
the “risks” of event occurrence are patterned 
over time. Here, for example, we investigate 
at what ages adolescents are at greatest risk of 
first experiencing sexual intercourse, attempt- 
ing to discern whether it is during their early 
teens, during their late teens, or in their twen- 
ties. Determining how the “risk” of first sexual 
intercourse differs with adolescent age will then 
ultimately provide us with answers to the orig- 
inal research questions that we posed about the 
“Whether?” and “When?” of sexual initiation, 
as we show below. 

But how can we use event-history data like 
these to summarize best the risk of event occur- 
rence across age, especially if some of the par- 
ticipants have censored event times? We begin 
by introducing a fundamental statistical quan- 
tity called the hazard probability to represent 
the risk of event occurrence in each time period. 
The population hazard probability in the j"” 
discrete-time period, labeled h(t;), is defined 
as the conditional probability that a randomly 
selected person will experience the target event 
in the j’" time period, given that he or she has 
not experienced the event in an earlier time 
period. Ultimately, you will come to realize that 
the concept of hazard probability underpins all 
of discrete-time survival analysis. 

The hazard probabilities describe the risk of 
event occurrence in each discrete-time period 
both as parameters in the population, or as esti- 
mates of those parameters in the sample. The 
estimation of hazard probability in a sample is 
straightforward and intuitive. In each discrete- 
time period, you simply identify the pool of 
people who still remain “at risk” of experienc- 
ing the event in that period—these are the indi- 
viduals who have reached this particular time 
period without already experiencing the event 
or being censored. They are referred to as the 
“tisk set.” Then, you compute the proportion of 
this risk set that actually experiences the target 
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event in the time period, thereby obtaining an 
estimate of the hazard probability for this par- 
ticular discrete-time period, as follows: 


# in sample risk set who 
experience the target 
» event in j“" time period 
h(t) = J P (1) 
# in sample risk set in 
j'" time period 


Anyone who experiences the event, or is 


censored, within the current period is removed 
from the risk set for the following time period 
and is therefore not included in the estimation 
of the hazard probability for that subsequent 
period. Notice how this definition of hazard 
probability is inherently conditional: it is only 
those individuals who have not experienced 
the target event, or have not been censored, 
in an earlier discret-time period, who can 
participate in the estimation of the hazard 
probability in a subsequent time period. 

We often plot the complete set of hazard 
probabilities against the time period to which 
each refers, joining the plotted points with 
line segments, yielding a profile of risk with 
age or hazard function. In the left-hand plot 
in the top panel of Figure 27.1, we present 
sample hazard functions for our adolescents 
who are approaching sexual initiation. We esti- 
mated these sample hazard probabilities sep- 
arately for adolescents with each of the three 
different romantic attachment styles—secure, 
avoidant, and preoccupied—using contingency 
table analysis (for a description of how to imple- 
ment this approach see Singer and Willett, 
2003). The obtained sample hazard functions 
describe the “risk” of first experiencing sex- 
ual intercourse at each of nine successive dis- 
crete ages, 13 through 21. Inspection of these 
functions helps pinpoint when the target event 
is most likely, and least likely, to occur. For 
instance, notice that for adolescents of all three 
romantic attachment styles, the risk of experi- 
encing first sexual intercourse is relatively low 


at ages 13 and 14, generally increases between 
age 15 and 17 (with a small decrease for preoc- 
cupied adolescents at age 17 that may be due 
to sampling idiosyncrasy), and then peaks at 
age 18. After this age, the risk of being sexually 
initiated, among those adolescents who have 
not yet experienced initial intercourse, declines 
but, by age 21, still remains at levels greater 
than those experienced in early adolescence. 
Beyond this overall temporal profile of risk, and 
ignoring minor differences in shape by group, 
notice that across time an interesting aggregate 
difference in risk by attachment style occurs. 
Preoccupied adolescents appear to be consis- 
tently at the greatest risk for sexual initiation, 
avoidant adolescents appear least at risk, and 
secure adolescents enjoy a more intermediate 
risk. We return to these aggregate differences by 
attachment style later, as a way of generating 
formal statistical models of the hypothesized 
relationship between the risk of sexual initia- 
tion and predictors, like attachment style. 

The “conditionality” inherent in the defini- 
tion of hazard is central to discrete-time sur- 
vival analysis. It ensures that all individuals— 
whether they are ultimately censored or expe- 
rience the target event—remain in the risk set 
until the last time period in which they are eli- 
gible to experience the event (at which point 
they are either censored or they experience the 
target event). For example, the sample hazard 
probability of initial sexual intercourse at age 18 
for adolescents with a secure attachment style is 
estimated, conditionally, using data on all those 
secure age 18 adolescents (124 out of the initial 
sample of 214) who had not yet first experi- 
enced sexual intercourse at an earlier age and 
who remained at risk at age 18. Of these 124 
secure at-risk adolescents, 42 had sex for the 
first time at age 18, leading to an estimated sam- 
ple hazard probability of 42/124, or 0.34. The 
conditionality of the definition of hazard proba- 
bility is crucial because it ensures that the sam- 
ple hazard probability deals evenhandedly with 
censoring—using all the information available 
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Figure 27.1 Hazard and survivor functions describing how the risk of first sexual intercourse depends upon 
adolescent age for 618 college students in a large mid-western university, by romantic attachment style (A = 
Avoidant; P = Preoccupied; S = Secure). The left-hand panel presents sample functions; the right-hand panel 
presents fitted functions from Model 1 of Table 27.2. 


450 Handbook of LongitudihdP Rega: https:/afrilibrary.com 


in the sample event histories by including 
both uncensored and censored cases in the 
risk set in any time period but not overex- 
tending this information beyond the time when 
the case can legitimately contribute data to the 
analysis. 


3.3. Survival probability 


In addition to using hazard probability to 
explore the conditional risk of event occur- 
rence in each discrete time-period, you can also 
cumulate the period-by-period risks to provide 
a picture of the overall proportion of the starting 
samples that “survive” through each discrete 
time-period—i.e., that do not experience the 
event up through the time period in question. 
This quantity is referred to as the survival prob- 
ability. In any given discrete-time period, it rep- 
resents the probability that a randomly selected 
population member will continue beyond the 
current time period without experiencing the 
target event. 

We can obtain values of the survival probabil- 
ity easily, in each discrete time-period, by accu- 
mulating the impact of the consecutive hazard 
probabilities. For instance, to begin, in the pop- 
ulation, no one has yet experienced the tar- 
get event, thus the survival probability is 1.00 
or 100%, by definition, at the origin of time. 
In the first discrete time-period in which tar- 
get events can occur, then, the risk set con- 
tains all members. The hazard probability is 
h(t,), however, and therefore its complement— 
ie. {1—h(t,)}—describes the proportion of the 
1st-period risk set that do not experience the 
target event in the period. Providing that cen- 
soring has occurred at random, then {1 — h(t,)} 
of the original 100% of members must survive 
through the first discrete time-period, and the 
corresponding survival probability for the 1st 
discrete time-period, which we label S(t,), is 
simply equal to {1—h(t,)} of 1.00. We can write 
this as {1—h(t,)} x 1.00, or simply {1 — h(t,)}. 
Similarly, in the 2nd discrete time-period, haz- 
ard probability becomes h(t,), indicating that a 


fraction {1—h(t,)} will now survive the period. 
Hence, the survival probability for the 2nd 
period must be {1 —(t,)} of the proportion who 
survived the 1st period. This quantity is equal 
to {1—h(t,)} of S(t,), or {1—h(t,)}S(t,). The 
same algorithm can be repeated in each dis- 
crete time-period, successively, such that the 
survival probability in any discrete time-period 
is simply equal to the complement of the hazard 
probability in that period multiplied by the sur- 
vival probability in the previous period. In the 
population, this algorithm can be represented, 
as follows: 


S(t) ={1—h(t)} SG) (2) 


Sample estimates can be obtained by simply 
substituting the corresponding sample statistics 
into this formula (for a lengthier description of 
this accumulation method and examples, see 
Singer and Willett, 2003). 

In parallel with our earlier usage for hazard, 
we use the term survivor function to refer to a 
display of the complete set of survival probabil- 
ities plotted against the time periods to which 
they belong. In the left-hand plot of the bot- 
tom panel of Figure 27.1, we display sample 
survivor functions describing the cumulative 
sexual initiation of adolescents with different 
romantic attachment styles. We obtained these 
estimated functions by applying the algorithm 
in equation (2) to the sample hazard func- 
tions in the top left plot. The sample survivor 
functions indicate that the proportion of adoles- 
cents who “survived”—did not experience sex- 
ual intercourse—through each successive time 
period from age 13 through 21. Notice that 
the curves are fairly high in the beginning of 
time, and close to a value of 1. Subsequently, 
in each of the three groups defined by attach- 
ment style, the sample survivor functions drop 
as time passes. At the beginning of time (age 
12), all adolescents are “surviving”—none of 
them has had sexual intercourse—thus the sam- 
ple survival probabilities are 1.00, by defini- 
tion. Over time, as adolescents initiate sexual 
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activity, the sample survivor functions drop. 
Because most adults do end up having sex- 
ual intercourse at some time in their lives, 
the curves do tend to decline towards a lower 
asymptote of zero, ending in this sample at 
sample survival probabilities of .18 for secure, 
.12 for preoccupied, and .31 for avoidant ado- 
lescents, by age 21. These sample proportions 
indicate that, by the end of their 21st year, an 
estimated 18% of secure, 12% of preoccupied, 
and 31% of avoidant adolescents had not expe- 
rienced sexual intercourse. By subtraction, we 
therefore know that 81% of secure, 88% of pre- 
occupied, and 69% of avoidant adolescents had 
experienced sexual intercourse at some point 
before the end of their 21st year. Notice that all 
sample survivor functions tend to have a similar 
shape — a monotonically nonincreasing func- 
tion of age or time. The details of the shape and 
the rate of decline, however, can differ consid- 
erably across groups. For example, although the 
three sample survivor functions in Figure 27.1 
have a similar monotonically declining trajec- 
tory, the sharper decline among preoccupied 
adolescents suggests that, in comparison to 
secure and avoidant adolescents, they are more 
rapidly sexually initiated between the ages of 
13 and 21. 

Having examined these details of the risk of 
sexual initiation, we can now respond specifi- 
cally to the “how long” question: On “average,” 
how long does it take before an adolescent has 
sexual intercourse for the first time? Such ques- 
tions cannot be answered by sample averages, 
because of the presence of censoring, but they 
can be answered by the estimation of a median 
lifetime from the sample survivor function. The 
median lifetime is the length of time that must 
pass until the value of the survivor function 
reaches one half, or .50. In other words, it is 
the time by which half of the individuals in 
the study have experienced the target event. In 
our example, for secure adolescents at the end 
of age 17, the sample survivor function is just 
above .50. At the end of age 18 it is less than 


.50. We can therefore use linear interpolation 
to estimate that this group has a median life- 
time of 17.4 years, indicating that secure ado- 
lescents wait, on average, until they are almost 
17% years old before initiating sexual inter- 
course. In the avoidant and preoccupied groups 
of adolescents, the respective sample estimates 
of median lifetime differ somewhat from the 
secure group and are about 17 years and almost 
18 years, respectively. Be warned, though: for 
target events that are rare, you may not be able 
to estimate a median lifetime because the sur- 
vivor function may not drop below its halfway 
point by the end of the observation period. 


4 Modeling event occurrence as a 
function of predictors 


Estimating sample hazard and survivor func- 
tions is a useful tool for exploring whether 
and when a group of individuals is likely to 
experience a target event, during the window 
of observation. These descriptive statistics can 
also be used to explore questions about differ- 
ences between groups. When do children repeat 
a grade in school, and are maltreated chil- 
dren more likely than nonmaltreated children 
to experience this event (Rowe and Eckenrode, 
1999)? When do adolescents first initiate sex- 
ual intercourse, and are adolescents with high 
levels of religiosity less likely to initiate sex- 
ual activity than those with low levels of 
religiosity? Both of these examples implicitly 
frame selected individual characteristics—such 
as child maltreatment and religiosity—as pre- 
dictors of the risk profile describing the occur- 
rence of the target event, grade repetition, and 
sexual initiation respectively. In fact, when we 
inspect the three sample hazard functions dis- 
played in the upper left of Figure 27.1 and com- 
pare them to each other, we, too, are implicitly 
treating attachment style as a predictor of the 
risk profile of first sexual intercourse. We con- 
clude that a relation may exist between the risk 
profile and attachment such that adolescents 
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with a preoccupied attachment style appear to 
be more likely to experience sexual initiation 
at all ages. 

“Eyeball” comparisons like these lack the 
credibility of formal statistical tests, however, 
making it difficult to account for the impact of 
sampling idiosyncrasy and to generalize back 
to the underlying population. Consequently, as 
in most quantitative analysis, our natural next 
step is to specify a formal statistical model that 
expresses our hypotheses about the relationship 
between the risk profile and predictors. If we 
are successful, we will be able to fit any such 
models to data in our person-period dataset, 
obtain formal estimates of parameters describ- 
ing the impact of hypothesized predictors on 
the risk of event occurrence, conduct tests of 
that impact, and make inferences back to the 
underlying population. 

But what is an appropriate form for a statis- 
tical model of the discrete-time hazard prob- 
ability as a function of predictors? We can 
motivate our specification of the forthcoming 
models by examining the three sample haz- 
ard functions presented in the left-hand side 
of the top panel of Figure 27.1, on whose rel- 
ative elevations we have already commented. 
Recall that we have coded, and inserted into 
our person-period dataset, two dichotomous 
predictors, AVOID and PREOCC, whose values 
distinguish adolescents with the avoidant and 
preoccupied attachment styles from the omit- 
ted category, a secure attachment style. Thus, 
when AVOID = 1 and PREOCC = 0, the ado- 
lescent has an avoidant attachment style; when 
AVOID = 0 and PREOCC = 1, he or she has 
a preoccupied attachment style; when AVOID 
and PREOCC are both zero, the adolescent has 
a secure attachment style. 

Let us now imagine that we want to spec- 
ify a statistical model that sensibly expresses 
the relationship between discrete-time hazard 
probability—the conceptual “outcome” of the 
analysis—and adolescent attachment style, rep- 
resented by its two dummy predictors, AVOID 


and PREOCC. Ignoring minor differences in the 
shapes of the hazard functions for a moment, 
notice how attachment style appears to impact 
the sample hazard functions in the top left cor- 
ner of Table 27.1. The sample hazard func- 
tion for preoccupied adolescents (AVOID = 0 
and PREOCC = 1) is generally at a “higher” 
elevation relative to the sample hazard func- 
tion for secure adolescents (AVOID = 0 and 
PREOCC = 0), which in its turn seems placed 
“higher” than the profile for avoidant adoles- 
cents (AVOID = 1 and PREOCC = 0). So, con- 
ceptually at least, it appears as though the 
effects of dichotomous predictors PREOCC and 
AVOID is to “shift” the sample hazard pro- 
files around vertically. How can we capture this 
behavior in a sensible statistical model? If we 
are to develop a reasonable statistical model for 
the relationship between the population hazard 
function and predictors, we must formalize our 
earlier conceptualization by specifying a model 
that permits variation in the values of PREOCC 
and AVOID to displace the hypothesized pop- 
ulation hazard profiles vertically, in some fash- 
ion. We are heartened by the fact that this is not 
unlike the way that the inclusion of a dummy 
predictor, or a pair of dummy predictors, in an 
ordinary linear regression model would shift 
the relationship between a generic outcome Y 
and a continuous predictor X vertically. 

The difference between a discrete-time haz- 
ard and a linear regression analysis, of course, 
is that the discrete-time hazard profile is a set 
of conditional probabilities, each bounded by 
0 and 1. Statisticians modeling a bounded out- 
come, like a probability, as a function of pre- 
dictors generally do not use a standard linear 
regression model to express the hypothesized 
relationship; instead they use a logistic model— 
in which the Jogit transform of the bounded 
outcome is represented as a linear function 
of predictors. This transformation is chosen 
as suitable for the analysis of outcomes that 
are probabilities because it acts to ensure 
that transformed values of the outcome are 
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unbounded and, consequently, it ultimately 
prevents derivation of fitted values that fall out- 
side the permissible range—in this case, 0 and 
1. In fact, it is the implicit use of the logit 
transform in the analysis of outcomes that are 
probabilities that leads to the well-known tech- 
nique of logistic regression analysis (Collett, 
1991; Hosmer and Lemeshow, 2000). The logit- 
transformation of the population hazard proba- 
bility in the j“" discrete-time can be represented 
as follows: 


logit h (t;) = log, (4) (3) 


Notice that, within the brackets on the right- 
hand side of this expression, we form a quotient 
from the hazard probability and its comple- 
ment. This quotient is the ratio of the prob- 
ability that an event will occur (h(t;)) to the 
probability that it will not occur (1—A(¢;)) in 
the j‘" discrete-time period (given that it had 
not occurred in an earlier time period). In 
other words, we recognize the quotient as 
a standard statistical commodity—the condi- 
tional odds that the target event will occur in 
this time period. In our example, for instance, 
it is the odds that an adolescent will experi- 
ence sexual intercourse for the first time in the 
j'® time period, given that he or she had not 
experienced it earlier. We then take the natu- 
ral logarithm of the conditional odds of event 
occurrence to obtain the logit-transformed or 
log-odds of hazard. This new quantity ranges 
between minus and plus infinity as hazard 
ranges between O and 1 and is therefore 
unbounded. 

Now, we can specify the risk of event occur- 
rence as a linear function of predictors without 
having to entertain the possibility that outra- 
geous fitted values will obtain. Thus, instead of 
treating untransformed hazard as the raw out- 
come in our discrete-time hazard models, we 
treat logit-hazard as the new outcome and, in 
our example, we might specify a discrete-time 


hazard model for the risk of sexual initiation 
as a function of adolescent age and attachment 
style for individual i in discrete-time period j, 
as follows: 


logit h,(t;) =[@,3A13; + a,,A14;+---a,A21;| 
+[B8,AVOID;+B,PREOCC,] (4) 


Ultimately, our intention is to fit this hypoth- 
esized model to the event history data that is 
recorded in our person-period dataset, in order 
to test, estimate, and interpret its parameters as 
a way of addressing our research questions. The 
model postulates that an outcome—in our case, 
the log-odds of hazard—is a linear function of 
two classes of predictors, here distinguished by 
brackets: (a) the adolescent’s age, expressed as 
a system of nine time dummies, A713 through 
A21, and (b) the substantive question predic- 
tors AVOID and PREOCC, jointly representing 
the adolescent’s attachment style. We comment 
on each class briefly below. 

The first class of predictors on the right- 
hand side of the discrete-time hazard model 
in (4), contained within the first set of brack- 
ets, provide the baseline logit-hazard pro- 
file. The slope parameters associated with 
each of the dichotomous time predictors— 
43, ,4,--+,@,;—Tepresent the population val- 
ues of the outcome—now a logit-transformed 
hazard probability—in each discrete-time 
period for the group of secure adolescents (for 
whom AVOID = 0 and PREOCC = 0). Substitut- 
ing these latter predictor values into the model 
for adolescents with a secure attachment style 
yields the following: 


logit h,(t;|A VOID; = 0; PREOCC; = 0) 
= [@,,A13;+0,,A14, +--+ a,,A21) (5) 


Working from this reduced model, it is eas- 
ier to see how the remaining model parameters 
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work to define the hypothesized population 
hazard function for the “baseline” secure group 
of adolescents. Recall that, in the person-period 
dataset, we have coded the values of the nine 
times dummies such that each one separately 
takes on the value 1 in the time period to which 
it refers. Thus, at age 13, predictor A713 takes on 
the value 1 while the rest of the dummies are 
set to zero (see Table 27.1). Substituting these 
values into (5) reduces the hypothesized model 
to an even simpler form: 


logit h,(t, |AVOID,; = 0; PREOCC, = 0) 
= [@43 (1) + @4, (0) +--+ a2, (0)] = a43 (6) 


This new discrete-time hazard model specifi- 
cation ensures that parameter a,, simply rep- 
resents the value of population logit-hazard— 
the transformed risk of event occurrence—at 
age 13. Similar substitutions into (5) for each 
of the other discrete-time periods confirms that 
parameters a44,...,@,, represent the popula- 
tion logit-hazard of event occurrence in the 
other discrete-time periods, respectively. Taken 
together as a group, then, the a-parameters are 
simply a logit-transformed population repre- 
sentation of the hazard function for adolescents 
with a secure attachment style, one per period, 
whose incarnation in the sample we have dis- 
played as one of the three curves in the upper- 
left plot of Figure 27.1. If we were to fit this 
reduced model to data in our person-period 
dataset, we could estimate the a-parameters, 
detransform them (see below), and plot them to 
obtain the fitted hazard function for the secure 
group. 

The simple discrete-time hazard model in 
(4) also contains question predictors AVOID 
and PREOCC, however, to represent the two 
facets of adolescent attachment style other than 
“secure.” We can thus write down hypothesized 
expressions for the population logit-hazard of 
sexual initiation in these other two groups by 


substituting their respective values of the pre- 
dictors and simplifying, as follows: 


logit h,(t,| AVOID; = 0; PREOCC, = 1) (7) 
= [@,3A13; + 04,A14, +--+ @,A21;]+ [Bo] 

logit h,(t,| AVOID; = 1; PREOCC; = 0) (8) 
= [@,,A13; +.04,A14, +--+ @,A21;]+[6,] 


Notice, now, how the additional 6-parameters 
function in the hypothesized discrete-time 
hazard model. Comparing (7) and (5), for 
instance, you will see that our model specifica- 
tion permits the logit-hazard functions for the 
preoccupied and secure groups to have a sim- 
ilar temporal profile of risk (embodied in the 
magnitudes of the a-parameters), but that the 
profile for the preoccupied group is “shifted 
vertically” by 8, in each discrete-time period. 
If parameter B, were positive, for instance, the 
logit-hazard profile describing the sexual initi- 
ation of preoccupied adolescents would retain 
the same shape as the baseline secure group, but 
be elevated above it by a distance , (in units 
of logit-hazard). We have observed this behav- 
ior in the sample data (in the upper-left plot of 
Figure 27.1), where the (untransformed) hazard 
profile for the preoccupied group appears to be 
elevated above that of the secure group, ignor- 
ing minor variations in the shape of the risk pro- 
files. By a similar argument, by comparing (8) 
and (5), we can argue that slope parameter, B, 
represents the shift associated with being in the 
avoidant group (again relative to the baseline 
secure group), although inspection of the upper- 
left plot in Figure 27.1 suggests that the sign on 
this parameter may be negative once the model 
has been fitted to data. 


4.1 Fitting the discrete-time hazard model 
to data and interpreting the results 


The population discrete-time hazard model 
specified in (4) has features that seem to be 
mathematically sensible, given the sample plots 
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we have examined in Figure 27.1. Our pur- 
pose now becomes the fitting of the model to 
our event history data in order to obtain rel- 
evant goodness-of-fit statistics, parameter esti- 
mates, and associated inferences with which 
to address the research questions about the 
“whether” and “when” of adolescent sexual 
initiation. While detailed justification of meth- 
ods of statistical estimation is not possible 
here, technical work has shown that model fit- 
ting and parameter estimation can easily be 
conducted by using standard logistic regres- 
sion analysis to regress the binary outcome, 
EVENT, on the time dummies and on the sub- 
stantive predictors in the person-period dataset 
(see Singer and Willett, 2003, for a more com- 
plete explanation). Parameter estimates, stan- 
dard errors, and statistical inference obtained 
in these logistic regression analyses are then 
exactly those that are required by the discrete- 
time survival analysis. We illustrate this here, 
by fitting two discrete-time hazard models 
to our data on adolescent sexual initiation. 
The first model (“Model 1”) contains the age- 
dummies and the AVOID and PREOCC pre- 
dictors and is therefore used to investigate 
the main effects of adolescent age and attach- 
ment style. Our second model (“Model 2”) adds 
the predictor RELIG in order to evaluate the 
marginal main effect of adolescent religiosity. 
The fitted discrete-time hazard models are pre- 
sented in Table 27.2. 

In Model 1 of Table 27.2, as explained 
above, the parameter estimates associated with 
each of the time-period dummy predictors, 
A13 through A21, provide the fitted shape of 
the baseline logit-hazard profile for adolescents 


1To fit the discrete-time hazard model specified 
in (4), you must conduct your logistic regression 
analyses with the “no-intercept” option selected, no 
stand-alone intercept is included in the model spec- 
ification. In fact, in our model specification in (4), 
the a-parameters function as a set of nine intercepts, 
one per discrete-time period. 


Table 27.2 Parameter estimates, approximate 
p-values, and standard errors from discrete-time 
hazard models representing the risk of first sexual 
intercourse in adolescence as a function of age, 
self-reported attachment style, and religiosity 

(n adolescents = 618, n events = 493) 


Predictor Discrete-time hazard model 
#1 #2 
Aus 5.72 —4.70 
(0.71) (0.73) 
Au 8,75 293 
(0.28) (0.32) 
Aus 2.34 =1,64 
(0.16) (0.23) 
Aus —1.50 —0.46 
(0.13) (0.21) 
Ay 1,39 =0.33 
(0.14) (0.22) 
Aye —0.61 0.50 
(0.13) (0.22) 
Ado —0.96 0.16 
(0.16) (0.24) 
Avo = 1,30 —0.15 
(0.20) (0.27) 
Agi —1.82 —0.64 
(0.26) (0.32) 
AVOID —0.38 —0.52 
(0.13) (0.13) 
PREOCC 0.25 0.14 
(0.12) (0.13) 
RELIG —0.06 
(0.009) 
Deviance statistic 2488.41 2450.89 


with a secure attachment style. We can detrans- 
form these estimates back into the world of reg- 
ular hazard in order to present the age-profile 
of the risk of sexual initiation for this group. 
For example, the fitted logit-hazard of sexual 
initiation in the first discrete-time period—i.e., 
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at age 13—is estimated as —5.72 in the first row 
of Table 27.2, which means that: 


log, (42) = —5.72 
1—h(t) 


Exponentiating, cross-multiplying and_ rear- 
ranging terms in this expression, we obtain: 


h (t,) = eo5:72 
1—h(t) 


A(t) =e°” (1 ~h (t,)) 

: en-72 0.00328 

hoe Sr) = (8 
+e 1+0.00328 

A(t,) =0.0032 


The resulting value indicates that the fitted haz- 
ard probability of sexual initiation at age 13 is 
very small, with only about one-third of one 
percent of secure adolescents at risk at this 
age. A similar computation, however, provides 
the following fitted risk of sexual initiation at 
age 18 for members of the same group: 


A(t) = e081 \ 0.5434 
; VW \ 44 e061) ~ 440.5434 


h(t,) = 0.35 


Thus, among secure adolescents who had not 
yet had sexual intercourse prior to age 18, about 
35% had intercourse for the first time at age 18. 
In a similar fashion, we can compute fitted haz- 
ard probabilities at all ages for the secure group 
and, from these fitted values, construct a fitted 
hazard function for the group. We have plotted 
this fitted risk profile as the middle curve in the 
upper-right plot of Figure 27.1. 

To get some sense of the risk of sexual ini- 
tiation for the other attachment subgroups, we 
can examine the parameter estimates associated 
with the two attachment predictors in Model 
1, which estimate the shift in baseline logit- 
hazard from the baseline secure group to the 


avoidant and preoccupied groups, respectively. 
The negative parameter estimate associated 
with the avoidant group (—.38) indicates that 
adolescents with avoidant attachment styles 
are, across all ages, at lower risk for sexual 
initiation at every age than are secure ado- 
lescents. To obtain their fitted hazard pro- 
files, you would simply subtract an amount 
.38 from the fitted logit-hazard values obtained 
for the secure group (i.e., the estimates associ- 
ated with the a-parameters), and then detrans- 
form each back into the world of regular hazard 
(as we have shown above) and again plot. We 
have displayed this second fitted risk profile 
as the lower curve in the upper-right plot of 
Figure 27.1. The positive parameter estimate 
associated with the preoccupied group (.25) 
indicates that adolescents with preoccupied 
attachment styles are, across all ages, at greater 
risk of engaging in first sexual intercourse than 
are secure adolescents. Again, we can combine 
this estimate with the estimates associated with 
the a-parameters and detransform to obtain the 
fitted hazard probabilities for the preoccupied 
group. We have displayed this third fitted risk 
profile as the lower curve in the upper-right 
plot of Figure 27.1. 

Comparing the right and left panels, in the 
upper panel of Figure 27.1, notice that the fit- 
ted hazard functions on the right side are far 
smoother than the sample hazard functions on 
the left side. This smoothness results from con- 
straints inherently imposed in the population 
hazard model specified in (4), which forces the 
vertical separation of the logit-hazard functions 
of any pair of groups to be identical in each 
time period. Just as we do not expect a fitted 
regression line to touch every data point in a 
scatterplot, we do not necessarily expect every 
point on a fitted hazard function to match every 
sample value of hazard at the corresponding 
age. Indeed, in this example, further analysis 
not presented here reveals that the discrepan- 
cies between the sample and fitted plots pre- 
sented in the upper row of Figure 27.1 are due to 
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idiosyncratic sampling variation, exactly anal- 
ogous to the differences detected between the 
fitted and observed values computed in regu- 
lar regression analysis. From these fitted hazard 
functions, we have used the cumulative algo- 
rithm in (2) to obtain companion fitted survivor 
functions for adolescents with the three kinds 
of attachment style—these are presented in the 
lower-right panel of Figure 27.1, and from them 
we could obtain fitted median lifetimes if that 
were required. 

The parameter estimates associated with the 
avoidant and preoccupied groups can also 
be interpreted by exponentiating them and 
interpreting their anti-logged values as fitted 
odds-ratios, as is common in regular logistic 
regression analysis. Anti-logging the parameter 
estimates, we obtain values of e~3* and e°, 
or 0.68 and 1.28 respectively, for AVOID and 
PREOCC. At every age, then, the fitted odds 
that an adolescent with an avoidant attachment 
style will have sexual intercourse for the first 
time are about two-thirds of the odds that a 
secure adolescent would do the same, given that 
they had not initiated sex earlier. Similarly, at 
every age, the fitted odds that an adolescent 
with a preoccupied attachment style will have 
sexual intercourse for the first time are slightly 
more than one-and-a-quarter times the odds that 
a secure adolescent would do the same, given 
that they had not initiated sex earlier. These 
are quite large effects, and they are both statis- 
tically significant, as you can tell by compar- 
ing the parameter estimates to their standard 
errors in Table 27.2, or by consulting the asso- 
ciated approximate p-values listed as super- 
scripts there. 

So, what have we learned by fitting this 
discrete-time hazard model to these person- 
period data? First, we can see the more clearly 
articulated profile of risk across time that 
is revealed by pooling information across all 
individuals in a single analysis, and we can 
use the statistics associated with the parame- 
ter estimates—standard errors and p-values—to 


make reasonable inferences about the popu- 
lation from which these data were sampled. 
Doing so here reveals a clear pattern of risk 
that is consistent with the one other previous 
study that included attachment style as a pre- 
dictor of sexual initiation: on average, adoles- 
cents with a preoccupied attachment style are 
most likely to have sex at an earlier age than 
adolescents with either a secure or avoidant 
attachment style, respectively (Canino, 2002). 
Second, we can quantify the decreased risk of 
sexual initiation for avoidant adolescents, and 
the increased risk for preoccupied adolescents, 
in comparison to secure adolescents, by relying 
on the fitted odds-ratios that we have obtained, 
0.68 and 1.28 respectively. 

The fitting of discrete-time hazard models 
provides a flexible approach for investigat- 
ing what affects event occurrence, and it is a 
method that evenhandedly incorporates data 
from both censored and nonuncensored indi- 
viduals, as we have described. Although these 
models may appear unusual at first glance, 
they actually closely mirror the more familiar 
multiple linear and logistic regression models. 
If you know how to conduct multiple linear 
and logistic regression analysis, then you know 
how to do discrete-time survival analysis. Like 
their more familiar cousins, discrete-time haz- 
ard models can incorporate multiple predictors 
simultaneously, by simply adding them to the 
models. Inclusion of multiple predictors per- 
mits examination of the effect of one predictor 
while controlling statistically for the effects of 
others. 

We have illustrated the inclusion of multiple 
predictors by adding predictor RELIG to obtain 
Model 2 in Table 27.2. Notice that the change in 
the deviance statistic between Models 1 and 2— 
a difference of (2488.41—2450.89) = 37.52, for 
a loss of 1 degree of freedom — confirms that 
religiosity is a statistically significant predictor 
of adolescent sexual initiation, after controlling 
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Figure 27.2 Fitted hazard functions describing how the risk of first sexual intercourse depends upon 
adolescent age, by attachment style (secure, avoidant, and preoccupied) and religiosity (low = 9.6; high = 


26.4), from Model 2 of Table 27.2. 


for attachment style (p < .001).2 The parame- 
ter estimate associated with this newly added 
predictor is negative, indicating that more reli- 
gious adolescents tend to be at lower risk of 
sexual initiation at each age, and that the antic- 
ipated decrement in logit-hazard is .06 for a 
1-unit difference in adolescent religiosity. In 
fact, in our sample, adolescent religiosity has 
a standard deviation of 5.6, thus we would 
expect that two adolescents whose religiosities 
were a standard deviation different would dif- 
fer in the logit-hazard of sexual initiation by 


Differences in deviance statistic between nested 
discrete-time hazard models can also be used to 
test general linear hypotheses concerning the joint 
impact of multiple predictors, as in regular linear 
and logistic regression analysis. 


(—.06 x 5.6) or —0.336. Anti-logging (e~%°° = 
.74), therefore, we find that the fitted odds that 
an adolescent who is one standard deviation 
more religious will be about three-quarters of 
the odds of a less religious adolescent, at each 
age. This same difference, by religiosity, is evi- 
dent for adolescents of each attachment style 
in the fitted hazard and survivor functions of 
Figure 27.2, which we have obtained by the 
usual methods.* Notice the relative differences 
in the effect sizes due to attachment style and 
religiosity. 


3In Figure 27.2, we plot fitted hazard functions for 
prototypical adolescents whose religiosity is 1.5 stan- 
dard deviations above and below the sample mean 
value of 8.4, for values of 9.6 (“low”) and 26.4 
(“high”). 
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5 Extensions of the basic 
discrete-time hazard model 


5.1 Including time-varying predictors 


You can include two very different kinds of 
predictors in the discrete-time hazard mod- 
els that we have introduced: (a) predictors 
whose values are time-invariant, and (b) those 
whose values are time-varying. As befits their 
label, the former are similar to the attach- 
ment style and religiosity predictors whose 
effects we investigated above. They describe 
immutable characteristics of people, such as 
their sex or race, and their values are stable 
across the lifetime. The latter, on the other 
hand, describe characteristics of people whose 
values may fluctuate over the life course, as 
might an individual’s self-esteem, marital sta- 
tus, or income. As we have noted earlier, time- 
varying predictors are easily incorporated into 
the person-period dataset, with their values 
potentially differing from row to row, and from 
there it is a short and obvious step to their inclu- 
sion as predictors in the discrete-time hazard 
models themselves. Singer and Willett (2003) 
present examples of the inclusion of time- 
varying predictors in discrete-time hazard mod- 
els and discuss subtleties of the interpretation 
of their fitted effects. 

The ease with which time-varying predictors 
can be incorporated into discrete-time hazard 
models offers social scientists an innovative 
analytic opportunity. Many important predic- 
tors of the human condition fluctuate naturally 
with time, including family and social struc- 
ture, employment, opportunities for emotional 
fulfillment, and perhaps most importantly, as 
noted above, the occurrence and timing of 
other events. In traditional statistical analyses, 
temporal fluctuation in such predictors must 
often be reduced to a single constant value 
across time, for each person. With the advent 
of discrete-time hazard modeling, this is no 
longer the case. Researchers can examine rela- 


tionships between event occurrence and predic- 
tors whose values are changing dynamically. 

There are at least two reasons why we believe 
that the ability to include time-varying predic- 
tors in discrete-time hazard models represents 
an exciting analytic opportunity for researchers 
wanting to study the occurrence of events 
across the life course. First, these researchers 
often find themselves studying behavior across 
extended periods of time, sometimes encom- 
passing more than 20, 30, or even 40 years. 
Although researchers studying behavior across 
short periods of time may reasonably be able 
to argue that the values of time-varying predic- 
tors are relatively stable during the study period 
(enabling them to make use of time-invariant 
indicators of these time-varying features), the 
tenability of this assumption surely decreases 
as the length of time studied increases. Second, 
many research questions focus on links between 
the occurrences of several different events. 
Researchers ask questions about whether the 
occurrence of one stressful event (e.g., parental 
divorce or death of a spouse) impacts the occur- 
rence of another stressful event (e.g., one’s own 
divorce or the onset of depression). Although it 
is possible to address such questions by com- 
paring the trajectories of individuals who have 
had, and who have not had, the precipitat- 
ing event at any time during the interval cov- 
ered by the observation period, this approach 
requires the researcher to set aside data on all 
individuals who experienced the precipitating 
event during the observation period. By cod- 
ing the occurrence of the precipitating event 
as a time-varying predictor instead, data from 
all individuals may be analyzed simultaneously 
(see Singer and Willett, 2003, for more on this 
topic). 


5.2 Using alternative specifications for time 


In the analysis presented here, we used a sys- 
tem of dummy predictors to provide a gen- 
eral specification for the effects of time on the 
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risk of event occurrence. Often, however, alter- 
native specifications of the effect of time are 
possible, resulting in more parsimonious and 
equally well-fitting models. Inspection of the 
fitted hazard functions displayed in the upper 
right of Figure 27.1, for instance, suggests that 
the main effect of adolescent age might be effec- 
tively represented by a cubic polynomial func- 
tion which would, in this case, first rise and 
then fall asymmetrically. To test this specific 
function, we could refit either of the models in 
Table 27.2, replacing the complete set of age- 
dummies currently present by an intercept and 
three predictors representing the effects of lin- 
ear, quadratic, and cubic age. Comparing the 
deviance statistics of the old and new models 
would then permit a formal test of whether the 
replacement was acceptable. Subsequent steps 
in the survival analysis could then proceed as 
described above, but now based on the more 
parsimonious representation of the main effects 
of adolescent age. 


5.3 Including interactions among predictors 


Just as in regular multiple linear and logistic 
regression analysis, you can easily incorporate 
two-way, and higher-order, interactions among 
predictors in the discrete-time hazard model. 
The process is identical to that used in regular 
regression analysis—you simply form cross- 
products of the focal predictors in the person- 
period dataset and include those cross-products 
as predictors in the discrete-time hazard model, 
along with their corresponding main effects. In 
our example, for instance, we could have added 
two columns to our person-period dataset in 
Table 27.1 to contain cross-products of AVOID 
and PREOCC with RELIG. Subsequent addition 
of the new pair of AVOID x RELIG and PRE- 
OCC x RELIG two-way interactions to Model 2 
of Table 27.2, to provide a new Model 3, say, 
would have permitted us to investigate whether 
the impact of religiosity on the risk of sex- 
ual initiation differed by adolescent attachment 
style. 


5.4 Including interactions with time 


When processes evolve dynamically, the effects 
of both time-invariant and time-varying pre- 
dictors may fluctuate over time. A predictor 
whose effect is constant over time has the same 
impact on hazard in all time periods. A pre- 
dictor whose effect varies over time has a dif- 
ferent impact in different time periods. Both 
time-invariant and time-varying predictors can 
have time-varying effects. In the present exam- 
ple, for instance, we would ask whether the 
effect of religiosity on the risk of sexual initia- 
tion differed over time. If the effect of religios- 
ity were time-invariant, its effect on the risk 
of sexual initiation would be the same regard- 
less of the age of the adolescent. If the effect 
of religiosity differed over time, in contrast, 
the predictor could have a larger effect on the 
risk of sexual initiation in the early years of 
adolescence, for example when adolescents are 
still living at home, than during the later teen 
years, when they have already moved out of the 
house. 

The discrete-time hazard models posited so 
far have not permitted a predictor’s effect to 
vary with time; they are called proportional- 
odds models. Hazard profiles represented by 
such models have a special property: in every 
time period (“t”) under consideration, the effect 
of the predictor on logit-hazard is identical. 
In (4), for example, the vertical shift in the 
logit-hazard profile for avoidant adolescents 
is always 6, at all ages and for preoccupied 
adolescents is always B, at all ages. Conse- 
quently, the hypothesized logit-hazard profiles 
for adolescents with the three different attach- 
ment styles have identical shapes since their 
profiles are simply shifted versions of each 
other. Generally, in proportional-odds models, 
the entire family of logit-hazard profiles rep- 
resented by all possible values of the predic- 
tors share a common shape and are mutually 
parallel, differing only in their relative eleva- 
tions. An alternative to the proportional odds 
model which is commonly used in event history 
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analysis is the proportional hazards model, 
in which it is the raw hazard rather than 
the logit-hazard that is assumed to be propor- 
tional. Such models can be calculated using 
generally available logistic regression software 
but using the complementary log-log function 
cloglog(h(t;))=In(—In(1—h(t,))) in place of the 
logit function in equation (3). This form of 
the dependent variable more closely paral- 
lels the proportional hazards assumption in 
the Cox semiparametric event history analysis 
model (see, e.g., Box-Steffensmeier and Young, 
Chapter 26 in this volume) and some parametric 
event history models (see, e.g., Joo, Chapter 27 
in this volume). One can assess empirically 
which of the two models, proportional odds or 
proportional hazards, best fits the data. 

But is it sensible to assume that the effects of 
all predictors are unilaterally time-constant and 
that all logit-hazard profiles are proportional in 
practice? In reality, many predictors may not 
only displace the logit-hazard profile, they may 
also alter its shape. If the effect of a predic- 
tor varies over time, we must specify a non- 
proportional model that allows the shapes of 
the logit-hazard profiles to differ over time. As 
you will recall from your knowledge of regular 
multiple regression analysis, when the effect of 
one predictor differs by the levels of another, we 
say that the two predictors interact; in this case, 
we say that the predictor interacts with time. To 
add such an effect into our discrete-time hazard 
models, we simply include the cross-product of 
that predictor and time as an additional predic- 
tor in its own right. 

We believe that the ability to include, and 
test, interactions with time represents a major 
analytic opportunity for empirical researchers. 
When studying the behavior of individuals 
over very long periods, it seems reasonable to 
hypothesize that the effects of predictors will 
vary as people experience different life stages. 
Although the effects of some predictors will 
remain constant throughout the lifetime, the 
effects of others may dissipate, or increase, 


over time. We believe that it is not hyperbole 
to state that interactions with time are every- 
where, if only researchers took the time to seek 
them out. Present data analytic practice (and the 
widespread availability of prepackaged com- 
puter programs) permits an almost unthinking 
(and often untested) adoption of proportional 
hazards models (as in “Cox” regression), in 
which the effects of predictors are constrained 
to be constant over time. Yet we have found, ina 
wide variety of substantive applications includ- 
ing not only our own work on employment 
duration (Murnane, Singer and Willett, 1989), 
but also others’ work on topics such as age at 
first suicide ideation (Bolger et al., 1989) and 
child mortality (Trussel and Hammerslough, 
1983), that interactions with time seem to be 
the rule, rather than the exception. We have 
every reason to believe that once researchers 
start looking for interactions with time, they 
will arise commonly. The key is to test the ten- 
ability of the assumption of a time-invariant 
effect. We refer the interested reader to Singer 
and Willett (2003). 


6 Is survival analysis really 
necessary? 


In this chapter, we have introduced innova- 
tive statistical methods for investigating the 
occurrence and timing of target events. We 
hope that our presentation has encouraged you 
to learn more about these methods, because 
we believe that they offer analytic capabili- 
ties that other methods do not. However, we 
believe that a decision to use these methods 
is more than a simple preference—on the con- 
trary, we are convinced that failure to use 
these methods, when they are required, can 
be a downright error! Our reason for taking 
this strong position has to do with the obvious 
failure of other statistical methods—including 
the simpler and more traditional methods— 
in addressing these same research questions. 
Survival-analytic methods deal evenhandedly 
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with the presence of the censored cases, which 
are permitted to contribute information to the 
analysis up until the point at which they are 
censored. Traditional analytic methods, in con- 
trast, either ignore censoring or deal with it in 
an ad hoc way, leading to problems such as neg- 
atively biased estimates of aggregate event time, 
or disregard for variation in risk over time. Fur- 
ther, because of their limitations in including 
predictors whose values vary over time or for 
permitting the effects of predictors to fluctuate 
over time, traditional methods also suffer when 
compared to survival analysis. Whereas tradi- 
tional methods force researchers to build static 
models of dynamic processes; survival methods 
allow researchers to model dynamic processes 
dynamically. For all these reasons, we invite 
you to investigate the possibilities offered by 
survival methods. 


Glossary 


Censoring All individuals at risk of experienc- 
ing the target event but who do not experience 
it during the observation period are said to be 
censored. 


Hazard probability The population hazard 
probability in any particular discrete-time 
period is the conditional probability that a ran- 
domly selected population member will experi- 
ence the target event in that time period, given 
that he or she did not experience it in a prior 
time period. 


Hazard function A plot of population hazard 
probabilities versus the corresponding discrete- 
time periods. 


Median lifetime The population median life- 
time is the length of time that must pass before 
the population survival probability drops below 
a value of one-half. In other words, it is the time 
beyond which 50% of the population has still 
to experience the target event. The median life- 
time can be thought of as an “average” time to 
event. 


Person-period dataset A longitudinal dataset 
used in the conduct of discrete-time survival 
analysis, in which each person contributes one 
record for each discrete-time period in which 
he or she is at risk of event occurrence. 


Risk set In any discrete time-period, the risk 
set contains only those participants who remain 
eligible to experience the target event in the 
period. At the “beginning of time” all partici- 
pants must be legitimate members of the “‘risk 
set,” by definition. When a participant experi- 
ences the target event, or disappears because of 
censoring, he or she is no longer considered a 
member of the “risk set.” 


Survival analysis A _ statistical method for 
addressing research questions that ask whether, 
and if so when, a target event occurs. Discrete- 
time survival analysis involves the modeling of 
the population hazard probability as a function 
of predictors. 


Survival probability The population survivor 
probability in any discrete time-period is the 
probability that a randomly selected population 
member will “survive” beyond the current time 
period without experiencing the target event. 
In other words, it is the probability that he or 
she will not experience the event during the 
current, or any earlier, time period. 


Survivor function A plot of population sur- 
vival probabilities versus the corresponding 
discrete time-periods. 


Target event (outcome) The uniquely defined 
event whose occurrence and timing the 
researcher is investigating. 
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| Chapter 28 


Generalized estimating equations for 
longitudinal panel analysis 
Joseph M. Hilbe and James W. Hardin 


Correlated datasets arise from repeated mea- 
sures studies where multiple observations are 
collected from a specific sampling unit (a spe- 
cific bank’s accounts receivable status over 
time), or from clustered data where observa- 
tions are grouped based on a shared char- 
acteristic (banks in a specific zip code). 
When measurements are collected over time, 
the term “longitudinal” or “panel data” is 
often preferred. The generalized linear models 
framework for independent data is extended 
to correlated data via the generalized estimat- 
ing equations framework. We discuss the esti- 
mation of model parameters and associated 
variances via generalized estimating equation 
methodology. 


1 Introduction 


Parametric model construction specifies the 
systematic and random components of varia- 
tion. Maximum likelihood models rely on the 
validity of these specified components, and 
then model construction proceeds from the 
(components of variation) specification to a 
likelihood and thereupon to an estimating equa- 
tion. The estimating equation for maximum 
likelihood estimation is obtained by equat- 
ing the derivative of the log-likelihood (with 
respect to each of the parameters of interest) to 


zero, and solving. Point estimates of unknown 
parameters are thus obtained by solving the esti- 
mating equation. 


2 Generalized linear models 


The theory and an algorithm appropriate for 
obtaining maximum likelihood estimates where 
the response follows a distribution in the single 
parameter exponential family was introduced 
in Nelder and Wedderburn (1972). This ref- 
erence introduced the term generalized linear 
models (GLMs) to refer to a class of models 
which could be analyzed by a single algorithm. 
The theoretical justification of and the practical 
application of GLMs have since been described 
in many articles and books; McCullagh and 
Nelder (1989) is the classic reference. 

GLMs encompass a wide range of commonly 
used models such as linear regression for con- 
tinuous outcomes, logistic regression for binary 
outcomes, and Poisson regression for count data 
outcomes. The specification of a particular GLM 
requires a link function that characterizes the 
relationship of the mean response to a vector of 
covariates. In addition, a GLM requires speci- 
fication of a variance function that relates the 
variance of the outcomes as a function of the 
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mean. The ability to specify the variance as a 
function of the mean times a scalar constant of 
proportionality is a restriction of the class of 
models. 

The derivation of the iteratively reweighted 
least squares (IRLS) algorithm appropriate for 
fitting GLMs begins with the general likelihood 
specification for the single parameter exponen- 
tial family of distributions. Within an iterative 
algorithm, an updated estimate of the coef- 
ficient vector may be obtained via weighted 
ordinary least squares. The estimation is then 
iterated to convergence; e.g., until the change in 
the estimated coefficient vector is smaller than 
some specified tolerance. 

For any response that follows a member of 
the single parameter exponential family of dis- 
tributions, 


fly) = exp{Ly0 — b(®)]/o + c(y, )} 


where @ is the canonical parameter and ¢ is 
a proportionality constant, and we can obtain 
maximum likelihood estimates of the px 1 
regression coefficient vector B by solving the 
estimating equation given by 


¥B)= MAL Xi" (vi — B)/LbV(H,) 14H ;/On;] 


j=1 
= (5x1) 


In the estimation equation, X; is the ith row 
of an nx p matrix of covariates X, p,; = g(x;B) 
represents the expected outcome E(y) = b’(@) in 
terms of a transformation of the linear predictor 
1; = X,B via a monotonic (invertible) link func- 
tion g(), and the variance V(p,,) is a function of 
the expected value proportional to the variance 
of the outcome V(y,;) = > V(y,). The estimating 
equation is sometimes called the score equation 
since it equates the score vector WB) to zero. 
Those involved in modeling GLMs are free 
to specify a link function as well as a variance 
function. If the link-variance pair of functions 
coincides with those functions from a single 


member of the exponential family of distri- 
butions, the resulting estimates are equivalent 
to maximum likelihood estimates. However, 
modelers are not limited to such choices. When 
selection of variance and link functions do 
not coincide to a particular exponential family 
member distribution, the estimating equation is 
said to imply the existence of a quasilikelihood; 
the resulting estimates are referred to as maxi- 
mum quasilikelihood estimates. 

The link function that equates the canonical 
parameter @ with the linear predictor 1, = x,B is 
called the canonical link. One advantage to the 
interpretation of results given the selection of 
the canonical link is that the estimating equa- 
tion simplifies to 


VB) = >» Y= UX"; —pj)/o= O(px:1) 


enforcing equivalence of the mean of the fitted 
and observed outcomes. A second advantage of 
the canonical link over other link functions is 
that the expected Hessian matrix is equal to the 
observed Hessian matrix. 


3 The independence model 


A basic individual-level model is written in 
terms of the n individual observations y, for 
i=1,...,n. When observations are clustered, 
due to repeated observations on the sampling 
unit or because the observations are grouped by 
identification through a cluster identifier vari- 
able, the model may be written in terms of the 
observations y, for the clusters i= 1,...,n and 
the within-cluster repeated, or related, observa- 
tions t=1,...,n;. The total number of obser- 
vations is then N = 5,n;. The clusters may also 
be referred to as panels, subjects, or groups. 
In this presentation, the clusters i are inde- 
pendent, but the within-clusters observations 
it may be correlated. An independence model, 
however, assumes that the within-cluster obser- 
vations are not correlated. 
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The independence model is a special case of 
more sophisticated correlated data approaches 
(such as GEE). This model assumes that there 
is no correlation of the observations within 
clusters. Therefore, the model specification 
may be in terms of the individual observations 
y,, Although the independence model assumes 
that the repeated measures are independent, the 
model still provides consistent estimators in the 
presence of correlated data. Of course, this con- 
sistency is paid for through inefficiency, though 
the efficiency loss is not always large as inves- 
tigated by Glonek and McCullagh (1995). As 
such, this much-simplified model remains an 
attractive alternative because of its computa- 
tional simplicity as well as its easy interpreta- 
tion. The independence model also serves as a 
reference model in the derivation of diagnos- 
tics for more sophisticated models for clustered 
data (such as GEE models). 

The validity of the (naive) model-based vari- 
ance estimators depends on the correct specifi- 
cation of the variance; in turn this depends on 
the correct specification of the working correla- 
tion model. A formal justification for an alterna- 
tive estimator known as the sandwich variance 
estimator is given in Huber (1967). 

Analysts can use the independence model 
to obtain point estimates along with standard 
errors based on the modified sandwich vari- 
ance estimator to ensure that inference is robust 
to any type of within-cluster correlation. While 
the inference regarding marginal effects is valid 
(assuming that the model for the mean is cor- 
rectly specified), the estimator from the inde- 
pendence model is not efficient when the data 
are correlated. 

It should be noted that assuming indepen- 
dence is not always conservative; the model- 
based (naive) variance estimates based on the 
observed or expected Hessian matrix are not 
always smaller than those of the modified 
sandwich variance estimator. Since the sand- 
wich variance estimator is sometimes called 
the robust variance estimator, this result may 


seem counterintuitive. However, it is easily 
seen by assuming negative within-cluster cor- 
relation leading to clusters with both positive 
and negative residuals. The cluster-wise sums 
of those residuals will be small and the result- 
ing modified sandwich variance estimator will 
yield smaller standard errors than the model- 
based Hessian variance estimators. 

Other obvious approaches to analysis of the 
nested structure assumed for the data include 
fixed-effects and random-effects models. Fixed- 
effects models incorporate a fixed increment 
to the model for each group, while random- 
effects models assume that the incremental 
effects from the groups are perturbations from 
a common random distribution; in such a 
model the parameters (variance components) 
of the assumed random-effects distribution 
are estimated rather than the effects. In the 
example at the end of this entry, we consider 
two different distributions for random effects 
in a Poisson model. 


4 Subject-specific (SS) versus 
population-averaged (PA) models 


There are two main approaches to dealing with 
correlation in repeated or longitudinal data. 
One approach focuses on the marginal effects 
averaged across the individuals (population- 
averaged approach), and the second approach 
focuses on the effects for given values of the 
random effects by fitting parameters of the 
assumed random-effects distribution (subject- 
specific approach). 

The population-averaged approach models 
the average response for observations sharing 
the same covariates (across all of the clus- 
ters or subjects), while the subject-specific 
approach explicitly models the source of het- 
erogeneity so that the fitted regression coeffi- 
cients have an interpretation in terms of the 
individuals. 

The most commonly described GEE model 
was introduced in Liang and Zeger (1986). This 
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is a population-averaged approach. While it is 
possible to derive subject-specific GEE mod- 
els, such models are not currently supported 
in commercial software packages and so do not 
appear nearly as often in the literature. 

The basic idea behind this approach is illus- 
trated as follows. We initially consider the esti- 
mating equation for GLMs. The estimating equa- 
tion, in matrix form, for the exponential family 
of distributions can be expressed as 


W(B) 
= oz Y= YX; "DlAp;/In, JV" (wi) —p;)/o 
= > X;"D[dp;/anj]V/” (Wi) Linx 


x Vo? (Hi) (Vi — bi) /b = O(px1) 


Assuming independence, V~‘(p,) is clearly an 
n; x n,; diagonal matrix which can be factored 
with an identity matrix in the center playing the 
role of the correlation of observations within 
a given group or cluster. This corresponds to 
the independence model we have previously 
discussed. 

The genesis of the original population- 
averaged generalized estimating equations is 
to replace the identity matrix with a para- 
meterized working correlation matrix R(a). To 
address correlated data, the working correlation 
matrix imposes structural constraints. In this 
way, the independence model is a special case 
of the GEE specifications where R(q) is an iden- 
tity matrix. 

Formally, Liang and Zeger introduce a sec- 
ond estimating equation for the structural para- 
meters of the working correlation matrix. The 
authors then establish the properties of the esti- 
mators resulting from the solution of these 
estimating equations. The GEE moniker was 
applied because the model is derived through 
a generalization of the GLM estimating equa- 
tion; the second order variance components are 


introduced directly into the estimating equa- 
tion rather than appearing in consideration 
of a multivariate likelihood. There are sev- 
eral software packages that support estimation 
of these models. These packages include R, 
SAS, S-PLUS, Stata, LIMDEP, and SUDAAN. 
R and S-PLUS users can easily find user- 
written software tools for fitting GEE models, 
while such support is included in the other 
packages. 


5 Estimating the working 
correlation matrix 


One should carefully consider the parameteri- 
zation of the working correlation matrix since 
including the correct parameterization leads to 
more efficient estimates. We want to carefully 
consider this choice even if we employ the 
modified sandwich variance estimator in the 
calculation of standard errors and confidence 
intervals for the regression parameters. While 
the use of the modified sandwich variance esti- 
mator assures robustness in the case of mis- 
specification of the working correlation matrix, 
the advantage of more efficient point estimates 
is still worth this effort. There is no contro- 
versy as to the fact that the GEE estimates are 
consistent, but there is some controversy with 
regard to their efficiency. This concern centers 
on how well the correlation parameters can be 
estimated. 

Typically, a careful analyst chooses some 
small number of candidate parameterizations. 
Pan (2001) also discusses the quasilikeli- 
hood information criterion (QIC) measures 
for choosing between candidate parameteri- 
zations. This criterion measure is similar to 
the well-known Akaike information criterion 
(AIC). 

The most common choices for the working 
correlation R matrix are given by structural 
constraints, parameterizing the elements of the 
matrix as: 
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Table 28.1 Common correlation structures. Values 
are given foru #4 v; Ry, =1 


Independent Ry =0 

Exchangeable Ry =a 

Autocorrelated — R,, =a!" 

AR(1) 

Stationary(k) Rwy = Qu-y if |u—v| <k 


0 otherwise 


Nonstationary(k) Rw = uy if |u—vl <k 
0 otherwise 
Unstructured Ray = as) 


The independence model admits no extra 
parameters and the resulting model is equiva- 
lent to a generalized linear model specification. 
The exchangeable correlation parameterization 
admits one extra parameter. The most gen- 
eral approach is to consider the unstructured 
(only imposing symmetry) working correla- 
tion parameterization which admits M(M — 1)/ 
2—M extra parameters where M = max,{n;}. 
The exchangeable correlation specification, the 


INDEPENDENT 
0 _ 
0 0 - 
0 0 0 - 


M-DEPENDENT 


pl p 

p2 pl = 

0 p2 pl = 
UNSTRUCTURED 

pl = 

p2 p4 : 

ps ps p6 - 


most commonly used correlation structure for 
GEEs, is also known as equal correlation, com- 
mon correlation, and compound symmetry. 

The elements of the working correlation 
matrix are estimated using the Pearson residual, 
calculated following each iteration of model fit. 
Estimation alternates between estimating the 
regression parameters B, assuming that the ini- 
tial estimates of @ are true, and then obtain- 
ing residuals to update the estimate of a, and 
then using estimates of a to calculate updated 
parameter estimates, and so forth until conver- 
gence. GEE algorithms are built around a GLM 
basis, with a subroutine called to update values 
of a. Estimation of GEE models using other cor- 
relation structures use a similar methodology; 
only the properties of each correlation structure 
differs. 

A schematic for representing how the fore- 
most correlation structures appear is found 
below. Discussion on how the elements in each 
matrix are to be interpreted in terms of model 
fit can be found in Twisk (2003) and Hardin 
and Hilbe (2003). 


EXCHANGEABLE 

Pp S 

Pp Pp : 

Pp Pp Pp 
AUTOREGRESSIVE 

pi 7 

p2 pi 

p3 p2 pi : 
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5.1 Example 


To highlight the interpretation of GEE analyses 
and point out the alternate models, we focus on 
a simple example. 

These data are froma panel study on Progabide 
drug treatment of epilepsy. Baseline measures of 


the number of seizures in an eight-week period 
were collected and recorded as base for 59 dif- 
ferent patients. Four follow-up two-week peri- 
ods also counted the number of seizures; these 
were recorded as si, s2, s3, and s4. The num- 
bers recorded in the base variable were divided 


Table 28.2 Data from Progabide study on epilepsy (59 patients over 5 weeks) 


id age trt base sl s2 s3 s4 
1 31 0 11 5 3 3 3 
2 30 0 11 3 5 3 3 
3 25 0 6 2, 4 0 5 
4 36 0 8 4 4 1 4 
5 22 0 66 7 18 9 21 
6 29 0 27 5 2 8 7 
7 31 0 12 6 4 0 2 
8 42 0 52 40 20 23 12 
9 37 0 23 5 6 6 5 

10 28 0 10 14 13 6 0 

11 36 0 51 26 12 6 22 

12 24 0 33 12 6 8 5 

13 23 0 18 4 4 6 2 

14 36 0 42 ve 9 12 14 

15 26 0 87 16 24 10 9 

16 26 0 50 11 0 0 5 

17 28 0 18 0 0 3 3 

18 31 0 111 37 29 28 29 

19 32 0 18 3 5 2 5 

20 21 0 20 3 0 6 7 

21 29 0 12 3 4 3 4 

22 21 0 9 3 4 3 4 

23 32 0 17 2 3 3 5 

24 25 0 28 8 12 2 8 

25 30 0 55 18 24 76 25 

26 40 0 9 2 1 2 1 

27 19 0 1- 3 1 4 2 

28 22 0 47 13 15 13 12 

29 18 a 76 11 14 9 8 

30 32 al 38 8 7 9 4 

31 20 1 19 0 4 3 0 

32 20 al 10 3 6 1 3 

33 18 al 19 2 6 7 4 

34 24 1 24 4 3 1 3 

35 30 ll 31 22 17 19 16 

36 35 1 14 5 4 7 4 

37 57 1 11 2 4 0 4 

38 20 1 67 3 7 7s 7 

39 22 1 41 4 18 2 5 
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Table 28.2 (Continued) 


id age trt base s1 s2 s3 s4 
40 28 1 7 2 1 1 0 
41 23 1 22 0 2 4 0 
42 40 1 13 5 4 0 3 
43 43 1 46 11 14 25 15 
44 21 1 36 10 5 3 8 
45 35 1 38 19 7 6 7 
46 25 1 7 1 1 2 4 
47 26 1 36 6 10 8 8 
48 25 1 11 2 al 0 0 
49 22 1 151 102 65 72 63 
50 32 1 22 4 3 2 4 
51 25 1 42 8 6 5 7 
52 35 1 32 He 3 1 5 
53 21 1 56 18 slp 28 13 
54 41 1 24 6 3 4 0 
55 32 1 16 3 5 4 3 
56 26 1 22 1 23 19 8 
57 21 1 25 2 3 0 1 
58 36 1 13 0 0 0 0 
59 37 1 12 a 4 3 2 


by four in our analyses to put this array of 
observations on the same scale as the follow-up 
counts. The age variable records the patient’s 
age in years, and the trt variable indicates 
whether the patient received the Progabide 
treatment (value recorded as one) or was part 
of the control group (value recorded as zero). 

An obvious approach to analyzing the data is 
to hypothesize a Poisson model for the number 
of seizures. Since we have repeated measures, 
there are many competing models. In our illus- 
trations of these alternative models, we utilize 
the baseline measure as a covariate along with 
the time and age variables. 

Table 28.3 contains the results of several 
analyses. For each covariate, we list the esti- 
mated incidence rate ratio (exponentiated Pois- 
son coefficient). Following the incidence rate 
ratio estimates, we list the classical (for all 
models) and sandwich-based (for all but the 
gamma distributed random-effects model) esti- 
mated standard errors. 


We re-emphasize that the independence 
model coupled with standard errors based on the 
modified sandwich variance estimator is a valid 
approach to modeling these data. The weakness 
of the approach is that the estimators will not 
be as efficient as a model including the true 
underlying within-cluster correlation structure. 
Another standard approach to modeling this type 
of repeated measures would be to hypothesize 
that the correlations are due to individual- 
specific random intercepts. These random 
effects (one could also hypothesize fixed effects) 
would lead to alternate models for the data. 

Results from two different random-effects 
models are included in the table. We could 
hypothesize that the correlation follows an 
autoregressive process since the data are col- 
lected over time. However, this is not always 
the best choice; in the present experiment we 
would have to believe that the hypothesized 
correlation structure applies to both the treated 
and untreated groups. 
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Table 28.3 Estimated incidence rate ratios and standard errors for various Poisson models 


Model time age baseline 

Independence 0.944 (0.019,0.033) 0.832 (0.039,0.143) 1.019 (0.003,0.010) 1.095 (0.002,0.006) 
Gamma RE 0.944 (0.019) 0.810 (0.124) 1.013 (0.011) 1.116 (0.015) 
Gaussian RE 0.944 (0.019,0.033) 0.760 (0.117,0.117) 1.011 (0.011,0.009) 1.115 (0.012,0.011) 
GEE(exch) 0.939 (0.019,0.019) 0.834 (0.058,0.141) 1.019 (0.005,0.010) 1.095 (0.003,0.006) 
GEE(ar 1) 0.939 (0.019,0.019) 0.818 (0.054,0.054) 1.021 (0.005,0.003) 1.097 (0.003,0.003) 
GEE(unst) 0.951 (0.017,0.041) 0.832 (0.055,0.108) 1.019 (0.005,0.009) 1.095 (0.003,0.005) 


The QIC values (described in Pan, 2001) 
for the independence, exchangeable, ar1, and 
unstructured correlation structures are respec- 
tively given by —5826.23, —5826.25, —5832.20, 
and —5847.91. This criterion measure indicates 
a preference for the unstructured model over 
the autoregressive model. The fitted correlation 
matrices for these models (printing only the bot- 
tom half of the symmetric matrices) are given by 


1.00 

0.51 1.00 

0.26 0.51 1.00 

0.13 0.26 0.51 1.00 
1.00 

0.25 1.00 

0.42 0.68 1.00 

0.22 0.28 0.58 1.00 


Note that if the exchangeable correlation 
structure were used, the correlation matrix 
would show a single value for all subdiagonal 
cells. A full discussion of the various correla- 
tion structures and how each are to be evaluated 
can be found in Hardin and Hilbe (2003). 
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Chapter 29 i 


Linear panel analysis 
Steven E. Finkel 


1 Introduction 


Linear panel analysis refers to the statistical 
models and methods appropriate for the analy- 
sis of continuous or quantitative outcomes with 
data collected on multiple units (i.e., individ- 
uals, schools, or countries) at more than one 
point in time. This means that linear panel 
analysis is concerned with the various ways 
of analyzing change in continuous variables, 
in describing different patterns of change for 
different units, in modeling why some units 
change more than others and what variables are 
responsible for these differences. These meth- 
ods may be distinguished from other kinds 
of longitudinal analyses, such as loglinear, 
transition or Markov models for analyzing over- 
time change in categorical outcomes, event his- 
tory or duration models for analyzing temporal 
processes leading to the occurrence of specific 
events, and time-series methods for the anal- 
ysis of change in continuous outcomes for a 
single unit over a relatively long period of time. 
Hence, what characterizes linear panel analysis 
is a focus on continuous outcomes for multiple 
units at multiple points. 

It is conventional within this general rubric 
to make one further distinction. In some panel 
datasets, time is dominant, i.e., relatively few 
units have been observed for relatively long 
periods of time. In other data, “N” is domi- 
nant, i.e., many units have been observed for 


relatively few points in time. Although the two 
kinds of data have the same formal structure, 
time-dominant data, sometimes referred to as 
“time-series cross-sectional data,” is typically 
analyzed with statistical methods rooted in the 
time-series econometric tradition (see Beck and 
Katz, 1995; Greene, 2003; see also Worrall, 
Chapter 15 in this volume). This chapter is con- 
cerned with the statistical methods used for 
the analysis of “N-dominant” panel data, typ- 
ically with observations on hundreds or pos- 
sibly thousands of units observed at two or 
more “waves,” in time. Examples of “panel 
data” are the National Election Studies (NES) 
panels that track thousands of the same respon- 
dents across multiple presidential and congres- 
sional elections in 1956-58-60, 1972-74-76, 
and 2000—2002—2004, the multiwave US Panel 
Study of Income Dynamics (PSID), and the 
German Socio-Economic Panel (SOEP) cover- 
ing some twenty-one waves of observation since 
1986.1 

There are several important motivations for 
analyzing panel versus cross-sectional data. 
Consider the hypothesis that economic perfor- 
mance contributes to the consolidation or sta- 
bility of democratic regimes. This hypothesis 


1 Available at www.umich.edu/~nes/, psidonline.isr. 
umich.edu/, and www.diw.de/english/sop/ 
respectively. 
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could be tested with cross-sectional data by 
predicting some sample of countries’ level of 
democracy at a given point in time with rel- 
evant economic indicators, perhaps includ- 
ing additional variables related to the social 
or political characteristics of the countries 
as statistical controls. Statistically significant 
effects of the economic variables would then 
be taken as supporting the hypothesis; if the 
researcher was confident enough in the speci- 
fication of the model or otherwise sufficiently 
bold, he or she might even claim that economic 
performance had a causal effect on countries’ 
level of democracy. 

Yet serious obstacles exist for successful 
causal inference in the cross-sectional context, 
many of which can be dealt with more eas- 
ily through the analysis of panel data. First, 
cross-sectional data contains no direct mea- 
sure of changes in Y nor changes in X—it is 
implicitly assumed in such designs that by com- 
paring units with higher and lower values on 
the independent and dependent variables sim- 
ulates “changes” in X (economic performance) 
and their impact on changes in Y (democ- 
racy). But what has been conducted is simply 
a fixed comparison of countries or units at a 
given point in time, which says little directly 
about what happens when individuals or units 
change. Far better from the point of view of test- 
ing theories of social, psychological, or political 
change by directly observing change over time, 
and of course that is what longitudinal or panel 
data provides at its very foundation. 

But the problems of causal inference in cross- 
sectional designs go beyond the lack of direct 
observation of change. First, panel data offer 
decided advantages in dealing with the problem 
of spurious relationships between variables, 
such that some outside variable Z not consid- 
ered by the researcher is actually responsible 
for the observed relationship between X and Y. 
Of course, theoretically-relevant variables that 
are observed in a given dataset should always 
be incorporated into any statistical model to 


attempt to avoid this kind of bias. But with 
panel data, the researcher has the ability under 
some conditions to control for unmeasured 
variables that may be confounding the observed 
relationship between X and Y. In the economic 
performance and democracy example, variables 
such as a country’s political culture or the 
degree of “entreprenuerialism” in the popula- 
tion may cause both economic and democratic 
outcomes, and to the extent that these variables 
are unobserved in the typical cross-national 
dataset, the estimated effect of economic perfor- 
mance on democracy will suffer from omitted 
variable bias. Panel data are no panacea for this 
problem but they offer the researcher far greater 
options for incorporating “unobserved hetero- 
geneity” between units into statistical mod- 
els, and controlling their potentially damaging 
effects, than is possible in cross-sectional anal- 
yses. 

A second obstacle in cross-sectional data is 
that because X and Y are measured at the same 
time, it is difficult to determine which one 
of the variables can be presumed to “cause” 
the other. Does economic performance lead to 
changes in a country’s level of democracy, or 
does the country’s level of democracy lead 
to higher levels of economic performance (or 
both)? With panel data, the researcher can track 
the impact of changes in X, or the level of X, 
at earlier points in time with Jater values of 
Y, and estimates of the effects of earlier values 
of Y on subsequent values of X can similarly 
be obtained. The ability to estimate dynamic 
models of reciprocal causality between X and 
Y by exploiting intertemporal change in both 
variables is one of the important advantages 
of longitudinal data analysis in general, and 
the methods used for estimating these kinds of 
models for continuous outcomes form a large 
part of the statistical toolkit for linear panel 
analysis. 

Third, it is also the case that successful causal 
inference depends on the accurate measure- 
ment of the variables in any statistical model. 
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As is well known, random measurement error 
in the independent variable of a bivariate model 
will attenuate the estimated effect of X on Y, 
with the direction of bias in multivariate mod- 
els being completely indeterminate (Bollen, 
1989; Wheaton, et al., 1977). While panel data 
are certainly not immune to these measure- 
ment problems, the longitudinal information 
contained in such data offer the researcher 
greater ability to model measurement error in 
the variables than is typically the case with 
cross-sectional data. As is the case with mod- 
els of reciprocal causality, measurement error 
modeling can proceed with fewer potentially 
unrealistic and restrictive assumptions in panel 
analyses, thus strengthening confidence in the 
estimated causal linkages between variables. 
Currently, there are several general methodo- 
logical approaches or frameworks within which 
researchers conduct linear panel analysis. One 
stems from the econometric tradition, and 
focuses most explicitly on the problem of unob- 
servables in the causal system (Baltagi, 2005; 
Frees, 2004; Hsiao, 2002; Woolridge, 2002). The 
analysis in this framework typically involves 
pooling or stacking the data across waves. This 
means that each row of data contains informa- 
tion on X and Y from a particular unit at only 
one of the panel waves, with information from 
unit (case) 1 at waves 1, 2,... through time T, 
followed in the dataset with information from 
case 2 at waves 1, 2,... T until the last row con- 
tains information on X and Y from the Nth case 
at the last wave (T) of observation. The pooling 
procedure thus yields N*T total observations for 
analysis. This setup allows the researcher con- 
siderable power in estimating a variety of panel 
models, including those where Y, is predicted 
by X, (or X,_,) as well as by an additional factor 
U that represents unobserved variables or influ- 
ences on Y for a particular unit i that remain 
stable over time. These factors give rise to what 
is referred to in the econometric literature as 
unobserved heterogeneity. In these models, a 
single effect of X, or X,_, on Y, is typically 
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produced that is purged of the potentially con- 
founding influence of the unobservables, with 
the information contained in the panel data 
being used in alternative models to “sweep out” 
or take into account the unobserved hetero- 
geneity in different ways, and to model possi- 
ble dependence in the units’ idiosyncratic error 
terms over time. 

Another approach to panel analysis stems 
from the structural equation modeling (SEM) 
and path analysis traditions in sociology 
and psychology (Bollen, 1989; Duncan, 1975; 
Finkel, 1995; Kenny, 1979; Kessler and 
Greenberg, 1981). In this framework, a sepa- 
rate equation for each dependent (endogenous) 
variable at each panel wave of observation is 
specified with a set of independent variables, 
which may themselves be either exogenous, or 
unpredicted by other variables in the model, or 
endogenous variables that are caused elsewhere 
in the overall causal chain. Thus Y observed at 
time 2 of the panel may be predicted by Y at 
time 1 (the “lagged endogenous variable”) and 
a series of time 1 Xs, Y at time 3 may be pre- 
dicted by Y and the Xs at time 2, and so forth. 
The resulting series of equations are then, given 
appropriate assumptions about error processes 
and the distribution of observed variables, typ- 
ically estimated simultaneously through maxi- 
mum likelihood or related methods in software 
packages such as LISREL, EQS, MLWin, or SAS. 
The SEM approach is often extended by includ- 
ing additional equations to model random mea- 
surement error in observed indicators of the 
exogenous and endogenous variables, resulting 
in a set of measurement equations linking Jatent 
variables with one or more error-filled indica- 
tors, and a set of structural equations linking the 
latent variables together in the presumed causal 
system. Such models may also be extended 
to test alternative causal lag structures in the 
model, such that variables may be presumed 
to exert causal influence on endogenous vari- 
ables either simultaneously (i.e., at the same 
wave of observation), or lagged by one or more 
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time periods. The SEM approach is particu- 
larly useful for the panel analyst in estimating a 
variety of reciprocal causal effects models that 
guard against the possibility of biases induced 
by measurement error in the variables, and 
that allow the flexible testing of alternative lag 
structures. 

Until recently, panel researchers have typ- 
ically conducted their analyses within one 
framework or the other, with the choice 
depending to some extent on the leanings of 
the particular social science or other discipline, 
and to some extent on the nature of the substan- 
tive problem and the threats to causal inference 
that were presumed to be especially serious in a 
given research area (i.e., measurement error ver- 
sus unobserved heterogeneity). But much work 
has been done in recent decades to bring the two 
traditions together, or perhaps more accurately, 
to bolster the toolkit of each framework so that 
it incorporates some of the major strengths of 
the other. Thus the SEM approach has recently 
been extended to incorporate models of unob- 
served heterogeneity, while the econometric 
approach has expanded to include more exten- 
sive models of measurement error, reciprocal 
causality, and dynamic processes than were 
typically the case in decades past. While it 
would not be accurate to state that there are 
no remaining differences between the SEM and 
econometric approaches, it is nevertheless the 
case that panel researchers now have the tools 
to at least attempt to overcome the common 
threats to successful causal inference using 
either analytical framework. 

In what follows, I shall provide an overview 
of the basic econometric and SEM methods 
used for linear panel analysis. I shall begin with 
the problem of unobserved heterogeneity, out- 
lining the “fixed” and “random” effects mod- 
els that deal with this problem. The discussion 
will then turn to the SEM approach for esti- 
mating dynamic models of reciprocal causality 
and then models with measurement error in 
the observed indicators. Finally, I will outline 


briefly more advanced models within the 
econometric and SEM traditions that attempt to 
incorporate both unobserved heterogeneity and 
dynamic processes in order to strengthen the 
causal inference process.” 


2 Unobserved heterogeneity models 
for linear panel analysis 


Consider a simple, single equation model of 
the effects of some independent variables X,, 
X,... X; on a dependent variable Y, each mea- 
sured at T points in time on a sample of 
N individuals. There are no reciprocal effects 
specified and perfect measurement is assumed 
for all variables. We can then write the basic 
panel model for the relationship between the J 
number of X variables and Y for the ith case at 
a given ft point in time as: 


Vit = A+B Xyie + Bo" Xo +-- By Xie +e; (1) 


This, of course, is the usual model analyzed 
in cross-sectional research (where T=1), using 
ordinary least squares (OLS) under the classical 
assumptions regarding the distribution of the 
idiosyncratic error terms s;,. With panel data, 
though, it is clear that the classical assump- 
tions regarding the error term are unlikely to 
be met, for the simple reason that the observa- 
tions are not independent across time. That is, 
the error term for case i at time t is likely to 
be related to the error term for the same case 
at time t+1, t+2, etc. One strategy for han- 
dling this problem is to maintain the structure 
of equation (1) and estimate parameters with 
some variant of “robust” standard errors that 


?This chapter will not discuss, except in passing, a 
third tradition for panel analysis, known variously as 
“hierarchical growth” or “mixed” models, or “latent 
growth” models. See Chapters 33-35 in this volume, 
Bollen and Curran (2005), and Singer and Willett 
(2002). 
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allow casewise dependence or other deviations 
from the standard assumptions. 

Most panel analysts, however, prefer to 
model the sources of the temporal depen- 
dence between observations of a given case 
more directly. One relatively simple setup is 
to assume that the dependence is produced by 
some stable, unobserved factor or factors U that 
are unique to a given unit and are also related 
to Y. This means that the “true” model (with U 
summarized as a single variable) is: 


Vit =&+By "Xai + Bo Xie +> B; "Xie + Uj + eit (2) 


For example, if Y is a country’s extent of polit- 
ical repression and the Xs represent some mea- 
sures of democracy and economic performance, 
the U term may encompass factors such as a 
country’s “culture,” history of violence, degree 
of ethnic homogeneity, and the like, all or 
some of which may not have been measured 
or have been otherwise available for inclusion 
in the analysis. If the model were at the indi- 
vidual level, such that Y represents, for exam- 
ple, a person’s knowledge about politics, the 
U term may encompass intrinsic intelligence, 
motivation, or family socialization processes 
that produced individuals who are generally 
higher or lower on knowledge than would be 
expected from the values of the observed Xs in 
the model. The U; are referred in the econo- 
metric literature as individual-specific, or unit 
effects. 

In cross-sectional analyses, the unobserved 
stable U factor(s) are folded into the equation’s 
unknown error term and the analyst can do lit- 
tle if anything about it. If the U are uncorrelated 
with the X; observed variables, this would mean 
that the equation’s explained variance is less 


3This is the motivating logic behind the class of panel 
estimators known as “generalized estimating equa- 
tions (GEE).” See Hardin and Hilbe (2003) and Zorn 
(2001), and also Hardin and Hilbe, Chapter 31 in this 
volume. 


Linear panel analysis 479 


than it might otherwise be, with inefficiency in 
the estimation of the standard errors for the 8; 
regression coefficients. But if (as seems likely), 
the U are related in some way to the observed X;, 
then the corresponding estimates of B; will be 
biased. The potentially confounding effects of 
omitted variables is, of course, one of the most 
serious problems in nonexperimental research 
of any kind. 


2.1 Fixed effects models 


With panel data, however, at least some head- 
way in attacking the problem can be made. 
Equation (2) may be rearranged to show that 
the presence of U implies that each unit has 
its own intercept (a+U;), where U,; may be 
viewed as all the stable unit-level factors that 
lead that case to be larger or smaller than the 
overall average intercept (a) for the dependent 
variable in the sample. This suggests that one 
way of dealing with the problem with panel 
data would be to estimate equation (2) with OLS 
by including a dummy variable for N-1 units, 
with the coefficient on the dummies represent- 
ing the individual-specific effects for each case 
(relative to an omitted baseline unit). This pro- 
cedure is referred to as the LSDV (least squares 
dummy variables) method, and produces con- 
sistent estimates of the §; coefficients for the 
Xj, controlling for stable unit effects that push 
the intercept for that case above or below the 
common (or baseline) intercept a. In this way 
we see that panel data can use the multiple 
observations on cases over time to begin to con- 
trol for the effects of some kinds of variables 
that are not measured or observed in a given 
dataset. 

The LSDV method is not usually applied, 
however, due to the need to include a poten- 
tially enormous number of dummy variables 
in large-N panel studies. A more common 
approach is to first express equation (2) in terms 
of the unit-level means of all observed variables, 
as in 
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Mg a+ B.X1;+B,X.;+---B,X,+U; +8, (3) 


This formulation is called the “between” equa- 
tion, as all “within-unit” variation over time is 
averaged out, leaving a model that only con- 
siders variation between the individual units. 
Subtracting (3) from (2) yields the fixed effects 
(FE) model that sweeps the U terms out of the 
equation altogether: 


Y; =a 2 = By (Xi _ X4;) ai Bo(Xoit > X>;) 
++ ++ BX — Xj) + (@4 = &)) (4) 


Given that U, is constant over time, U; = U; 
and hence drops out from equation (4). Estimat- 
ing the resulting regression with OLS (with an 
appropriate adjustment for the standard errors 
due to the N number of unit means figuring in 
the estimation) yields consistent estimates of 
the B; effects of the observed X; variables.* The 
fixed effects estimator is also referred to as the 
“within” estimator, as it considers only varia- 
tion in X and Y around their unit-level means, 
as all “between unit” variation (i.e., differences 
in X, and Y,) are eliminated through the mean- 
differencing procedure. 

A related approach to sweeping out the 
individual-level U; is simply to lag equation (2) 
by one time period, as in: 


Ying = A+B Xyi_y +B. Xai ++ By Xia 
+ Ui + &it (5) 


‘That is, since N degrees of freedom are used in the 
calculation of the unit means, the appropriate df for 
the fixed effects model is NT-N-k-1. The constant 
term a in (2) may also be recovered by adding the 
grand mean for Y and each of the X; to each of the 
deviation expressions in (4). Both of these adjust- 
ments are implemented automatically in STATA and 
other software packages for estimating econometric 
panel models. 


Subtracting this equation from (2) yields the 
first difference (FD) equation that also produces 
consistent estimates of the §;: 


Y; = Yea = By (Xt _ Xi i-1) — By (Xai —_ Xp i-1) 
+ By (Xp — Xpit_a) + (€# — €it-1) (6) 


While both FD and FE methods are consistent, 
the fixed effect procedure is more commonly 
applied in multiwave panel data, as it makes 
use of all of the over-time variation in X in 
its calculations, and provides a more parsimo- 
nious method of expressing X and Y at a given 
time period as deviations from an overall mean 
value. Subtracting X,_, in the first difference 
approach is inefficient in the sense that only 
one of the lag values of X is used to difference 
out the unit effect, and the one that is used is 
to some extent arbitrary (why not subtract X,_, 
or X,_3?). 

The FE and FD procedures are relatively 
simple, yet powerful methods for panel mod- 
els controlling for unobserved heterogeneity 
between cases. As Allison (1994) and Hal- 
aby (2004) have shown, these methods are 
also equivalent to the popular “difference in 
difference estimator” (DID) for assessing treat- 
ment effects in quasi-experimental and pol- 
icy research. If, as is plausible in almost all 
nonexperimental research situations, there are 
unobserved differences between members of 
the treatment and control groups, then it may 
be the case that such differences and not 
the treatment itself may produce the observed 
differences between the two groups. Construct- 
ing the difference between both the treat- 
ment group’s and control group’s pre-test score 
and post-test scores, or, in multiwave data, 
between the treatment group’s and control 
group’s pre-test and post-test mean-deviated 
scores, essentially removes the unobserved (sta- 
ble) differences between the two groups from 
consideration. The “difference in differences” 
between the treatment and control groups is 
then the pure effect of the treatment, control- 
ling for these unobserved differences (and any 
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observed covariates that are included in the 
model as additional controls).> Allison (1994) 
shows how this logic can be applied to esti- 
mating the effect of a variety of treatments with 
panel data, where the treatments may be given 
to individuals who were not randomly assigned 
to different conditions in quasi-experimental 
conditions, or where the “treatments” were sim- 
ply the experience of some life event such as 
divorce, military service, unemployment, or job 
promotion between panel waves. Whenever sta- 
ble, unobserved factors at the unit level may be 
related to the unit’s likelihood of experiencing 
of these events and also related to the unit’s 
value on some outcome variable, the FE or FD 
approach sweeps out the (stable) unobservables 
to produce consistent estimates of the effects of 
the event or treatment itself.® 

Despite their clear strengths in eliminating 
the potentially confounding effects of unob- 
served hetereogeneity, the FE and FD models 
have certain features that render them prob- 
lematic for some panel analyses. First, the 
differencing or mean-difference process elim- 
inates not only the stable unobserved factors 
from consideration, but also all stable observed 
factors, such that the researcher can say noth- 
ing about the effects on the dependent variable 


5The logical equivalence of the DID estimator and the 
FE estimator is ensured only when a dummy variable 
for the wave of observation is also included in the FE 
model. This presents no special statistical problems 
and results in the “two-way” fixed effects model. 
This addition is necessary because the FE (and FD) 
models eliminate from consideration those observa- 
tions that do not change on X over time; the inclusion 
of the time dummies thus captures the general time 
effect on Y that occurs for both treatment and control 
groups, with the additional impact of the treatment 
being given by its coefficient. 

®It is also possible to test for the statistical signifi- 
cance of the unit effects as a whole through estima- 
tion of a nested F test that compares the R-squared 
of models with and without the inclusion of the N-1 
unit effects. 
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of characteristics of units, countries, individ- 
uals, or other units that do not change over 
time. In many research situations, the effects of 
(nearly) stable individual-level factors such as 
education or sex, or, at the country level, stable 
political characteristics such as electoral sys- 
tems or institutional arrangements are of prime 
theoretical interest, and FE/FD models are less 
attractive.’ Second, as was noted above, the 
FE/FD method uses 1/T of the available degrees 
of freedom, which in short panels can be a rel- 
atively high cost. Third, it is also the case that 
the estimates of the unit-effect in short-term 
panel studies are based on only a few waves 
of observation, and hence may be unreliable to 
the extent that chance factors produce a few 
consistently high or low readings on the depen- 
dent variable for a given case over time. Since 
FE/FD methods take these unit-effects as given, 
they potentially overstate the “true” amount of 
temporal dependence produced by stable unob- 
servables.® 


"Interaction effects between time and stable unit- 
level factors may, however, enter FE models, so that 
the researcher may estimate how the impact of a sta- 
ble variable changes across waves of the panel. 

®It is also argued that FE methods are more appro- 
priate in instances where the analyst wishes to make 
inferences conditioned on the observed units in the 
sample, as in analyses at the country level where, 
for example, Germany’s unit effect is of interest and 
would be the same no matter how many different 
country samples are drawn. In instances where large 
numbers of individuals are sampled randomly from 
some population (and hence the specific individ- 
ual effects are of less interest), the random effects 
approach to be discussed subsequently is arguably 
more appropriate. See Allison (1994) and Teachman 
et al. (2001) for counter-arguments on this point, 
claiming that FE and RE are simply alternative ways 
of dealing with the “nuisance” posed by the presence 
of the unit effects in (2) above. 
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2.2 Random effects models 


An alternative approach to estimating equation 
panel models with unobserved heterogeneity, 
the random effects or random intercepts (RE) 
model overcomes some of these deficiencies, 
though not without some additional costs of its 
own. Consider the two sources of unobserved 
error in equation (2): the unobserved unit effect 
(Ui) and the unit-time-specific idiosyncratic 
error term ¢;,. In the “random effects” approach, 
both of these sources of error are treated as the 
realization of random processes and assumed to 
be independent, normally-distributed variables 
(with variances denoted as o,? and o,” respec- 
tively). Some units have a higher intercept on 
the dependent variable because of a large ran- 
dom U term, some units have a lower intercept 
because of a small random U term; added to 
this error is the idiosyncratic error s; which 
produces further deviations from the linear pre- 
diction of Y from the U and the B,X; terms of 
the model. The only additional unknown in this 
setup compared to the pooled OLS panel model 
of equation (1) is o,?, and thus one immediate 
benefit of the random effects model is that it 
saves a significant number of degrees of free- 
dom compared to its FE or FD counterparts. 
Estimation of the RE model proceeds under 
two assumptions. First, the two components of 
the composite error term, U; and the «;,, are 
assumed to be unrelated, otherwise no sepa- 
rate estimate of each would be possible. Sec- 
ond, and more important, both error terms must 
be assumed to be unrelated to the included 
X variables in the model, ie., E(X,;U;,) = E(X; 
€;,) = 0. This is of course problematic, in that 
the setup assumes away the possible correla- 
tion between X and the unit-effect unobserv- 
ables that prompts many panel analyses in the 
first place(!) The situation need not be so dire, 
however, as will be discussed in more detail 
below. Given these assumptions, the compos- 
ite error term of the model, U;+;,, has a fixed 
structure over time, with the variances (diag- 
onal elements) being equal to o,?+0,’, and 


the covariances (off-diagonals) being equal to 
o,,° for every time period. As such, estimation 
can proceed through feasible generalized least 
squares (FGLS) methods that weight the model 
by the inverse of the error variance—covariance 
matrix, in this case the weight 0 calculated as: 
2 
@=1-—2 _ (7) 
J To2+o02 

Once an estimate of @ is obtained through 
manipulation of the “within” and “between” 
regressions of equations (3) and (4) above’, the 
model is then transformed as: 


Y; _ OY; = By (Xiit _ OX,,) + By (Xait . OX,,) 
+ +++ BY Xp — OX ;;) + (U;,— @U;) 
+ (€& — 9€;) (8) 


with this equation’s error term (comprised of 
the two last terms in parentheses) now having 
constant variance and zero correlation across 
time units for each case. 

Several features of the RE model are espe- 
cially noteworthy. First, it can be seen that as 0 
approaches 1, it means that more and more of 
the composite error variance is made up of unit- 
level (between) variance o,”. In the unlikely 
event that 6 equals 1, all of the error vari- 
ance is unit-level variance, and the RE estimator 
reduces to the FE mean-differenced estimator 
of the B. As 6 gets closer to 0, more and more 
of the error variance is made up of the random 
o,” component, with no unit-level variance to 
take into account, and the RE estimator thus 
reduces to the pooled OLS approach repre- 
sented by equation (1). So the RE approach rep- 
resents something of a middle ground between 


®An estimate of o,” is obtained from the “within” 
regression of (4), as all of the unit-level variation 
has been purged from this model. An estimate of 
o,” is obtained by manipulating the error term of 
the “between” regression of (3), which produces the 


error term (o,,7 + 0,2/T). 
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the FE and OLS models, weighted toward one or 
the other depending on how much of the error 
variance is comprised of unit-specific versus 
idiosyncratic components.'° It may also be said 
that RE “adjusts” or “shrinks” the FE estima- 
tor back toward pooled OLS to the extent that 
the unit-level effects in general are either small, 
relative to overall error, or are unreliable due 
to a relatively small T (as can be seen from the 
denominator of equation (7)).™ 

Second, the RE model produces estimates of 
the effects of both changing and stable indepen- 
dent X variables, as its estimation equation (8) 
does not result in the elimination of any stable 
variable so long as 0 is not 1 (or very close to 1, 
when estimates of stable variables will tend to be 
very imprecise). The ability to model Y as func- 
tions of both changing and unchanging X vari- 
ables over time is one of the major advantages 
of the RE approach to unobserved heterogeneity; 
as noted, however, this (and other) advantages 
of RE may be enjoyed only to the extent that the 
assumptions of the model hold, i.e., that there is 
no correlation between the X; and the U;. 

For this reason, there has been much debate 
in the panel literature over the applicability of 
FE versus RE. The “Hausman” test provides one 
way of adjudicating the dispute, by providing a 
test statistic to assess the significance of the dif- 
ference between the FE and RE estimates. The 
logic behind the test is that, if the assumptions 
of the RE model hold, then FE and RE are two 
different ways to arrive at consistent estimates 
of the B, but RE is more efficient. If the assump- 
tions of the RE model do not hold, then RE will 
be inconsistent, while FE will always be consis- 
tent. Thus, one should see similar estimates of 


©The proportion of total variance comprised of unit- 
specific variance is also referred to as the “intraclass 
correlation coefficient.” 

Jt can also be seen from equation (7) that the RE and 
FE converge as T > oo, so the issue of fixed versus 
random effects is not as relevant in large T panel (or 
time-series cross-sectional) studies. 
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the B; if the RE assumptions hold, and different 
ones if they don’t. Hausman (1978) showed that 
the statistic 


B(FE) — B(RE) 
varB(FE) — varB(RE) 


is distributed as x” with J degrees of freedom, 
with failure to reject the null hypothesis imply- 
ing that the RE model is appropriate. If the 
Hausman null hypothesis is rejected, then it 
may be concluded that there is some viola- 
tion of the RE assumptions and that a likely 
nonzero correlation between the X; and the 
U, exists which the analyst should take into 
account. One way to do so is through the FE or 
FD methods that eliminate the U; from consid- 
eration in the estimation equation altogether. 
Another is by including in a random effects 
version of (2) the unit-level mean for each time- 
varying independent variable (i.e., X;) as addi- 
tional predictors (Skrondal and Rabe-Hesketh, 
2004, pp. 52-53). The RE “problem” may thus 
be viewed as an omitted variable issue, with 
the unit effects being potentially correlated (at 
the unit-level) with the included explanatory 
factors. Once the time-varying unit-level means 
are included, this correlation is essentially con- 
trolled for in the model, with the resultant com- 
posite error term satisfying the assumptions for 
FGLS estimation.” 


2.3 Example: The effects of civic education 
on political knowledge in Kenya, 2002-2003 


We illustrate these models with panel data col- 
lected on 401 individuals interviewed at three 


12Pliimper and Troeger (forthcoming) provide a sim- 
ilar method for incorporating unit-level means into 
a fixed effect model. In both cases, it is still neces- 
sary to assume that the time-invariant X, are unre- 
lated to U,,. See Hausman and Taylor (1981) for an 
approach to this problem involving different assump- 
tions about correlations between particular time- 
invariant X variables with the unit effects. 
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waves between February 2002 and June 2003 in 
Kenya, as part of a study evaluating the effects 
of attending civic education and democracy 
training workshops on changes in democratic 
knowledge, attitudes, and participation in the 
run-up to the Kenyan December 2002 presiden- 
tial elections.*® A total of 210 respondents were 
interviewed before they attended a civic educa- 
tion (CE) workshop between February and April 
2002, with 191 individuals serving as the con- 
trol group, as they were selected to match the 
“treatment” group on place of residence, gen- 
der, age, and educational status. There were two 
follow-up interviews for all respondents, one 
in November, some 7—9 months after the work- 
shop, and another in April-May 2003, about 
one year after the initial workshop took place. 
Our concern here is whether the workshop 
exposure (the “treatment”) led to significant 
changes in respondents’ knowledge of politics, 
measured with four questions asking the name 
of various elected officials (the Vice President 
and the Provisional Commissioner) and various 
institutional provisions in the Kenyan political 
system (e.g., the length of the term of office of 
the President and the procedures for amending 
the Kenyan constitution). 

Given that individuals selected themselves 
into the “treatment” group, in that attendance 
at local civic education workshops was purely 
voluntary, it is likely that the treatment group 
differs from the control group on some mea- 
sured variables, such as interest in politics, 
and also on unmeasured factors that may 
influence political knowledge such as intrin- 
sic intelligence, motivation, openness to polit- 
ical reform, personal discussion networks, and 
the like. To control for these potentially con- 
taminating effects, a series of unobserved het- 
erogeneity models were estimated with the 


3For more information on the overall study, includ- 
ing details on the sampling and questionnaire design, 
see Finkel (2003), available at www.pitt.edu/~finkel. 


three-wave data, and the results are shown in 
Table 29.1. 

The model in the first column of the table 
shows the pooled OLS model of (1) above, i.e., 
one that contains no unobserved heterogene- 
ity term whatsoever. In this model, individu- 
als exposed to civic education workshops are 
on average .32 higher on the dependent vari- 
able after exposure than individuals in the con- 
trol group, holding interest, education, sex, and 
age constant. The dummy variables for wave 
2 and wave 3 show that, controlling for other 
independent variables, there is a .31 increase 
in knowledge in wave 2 compared to wave 
1 (the baseline wave of the panel), and a .18 
increase in knowledge in wave 3 compared to 
wave 1 for all individuals, including those in 
the control group. In the fixed effects model 
of column (2), the unit-specific effect speci- 
fied in the theoretical model of (2) above is 
swept out through the mean-differencing pro- 
cess, leaving the “within” regression of indi- 
vidual deviations from their own means. In 
this model, the effect of civic education falls 
to .21, approximately two-thirds of its magni- 
tude in the pooled model, though still signif- 
icantly different from zero. Note that in the 
FE model, there are no estimates for the time- 
invariant factors of education, sex, and age, as 
they drop out of the model (along with U;) 
through mean-differencing. Thus the FE model 
shows that exposure to civic education has a 
significant impact on later political knowledge, 
controlling for observed and unobserved sta- 
ble factors at the individual level that may 
be correlated with both CE exposure and with 
knowledge. 

An alternative random effects model is shown 
in column (3) of the table. As can be seen, 
the estimates are much closer to the pooled 
OLS model in column (1) than the fixed effects 
estimates, with CE now having a .31 effect on 
knowledge. Estimates of the stable observed 
control variables are also similar to their OLS 
values, as they should be, given the relatively 
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Table 29.1 Unobserved heterogeneity models, Kenya three-wave civic education study 


Variable (1) (3) (4) 
Pooled Fixed Random Random effects with 
OLS effects effects unit-level means 
Civic education 32 31 21 
(.05) (.10) (.07) (.10) 
Political interest 27 .24 12 
(.04) (.05) (.04) (.05) 
Education .20 .20 19 
(.02) (dropped) (.02) (.06) 
Male A7 48 43 
(.05) (dropped) (.06) (.06) 
Age —.005 —.005 —.005 
(.002) (dropped) (.002) (.002) 
Wave 2 dummy 31 i 33 Al 
(.07) (.08) (.07) (.08) 
Wave 3 dummy 18 ‘ .20 .28 
(.07) (.08) (.07) (.08) 
Treatment mean .13* 
noo aaoe (.13) 
Political interest 32 
mean (.09) 
Constant .78 2.01 85 33 
(.15) (.13) (.17) (.21) 
Adj. R-squared .26 27 .28 
Degrees of freedom 1193 1192 1190 
Intraclass correlation ---- 18 18 
Estimated 0 ---- 22 22 


All coefficients statistically significant at p<.05 except *. Standard Errors in Parentheses. N = 401. 


small estimated value for 0 (.22). This model, 
however, assumes no correlation between the 
treatment or other observed variables and the 
random unit-level error term; a Hausman test 
shows that this assumption is likely to be vio- 
lated, as the null hypothesis of no difference 
between the FE and RE models can be rejected 
(x? = 13.88, p<.01). Thus the choice here is 
between a FE specification, or a random effects 
specification with the unit-level means con- 
trolled (model 4). In both instances the CE effect 
is estimated to be .21; the difference between 
the models, aside from their assumption about 
the distribution of the U;, is that the RE model 
also provides information on the effects of the 


time-invariant control variables that are differ- 
enced out of the FE specification.“ 


4Tt should be noted that the specification of the effect 
of CE in all of the models in Table 29.1 is that of 
an additive, permanent change, such that CE leads 
to an increase in knowledge in the immediate time 
period after exposure, with the effects persisting over 
time. Moreover, the effect of time itself is modeled 
with dummy variables for each time period. Alli- 
son (1994) shows other ways to model the impact of 
events, as well as alternative models of time effects. 
Random effects models that incorporate randomly- 
varying time-trends are also referred to as “hierarchi- 
cal longitudinal growth models” (see Raudenbush and 
Bryk, 2002, and also Luke, Chapter 33 in this volume). 
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3 Dynamic panel analysis 


To this point, it has been assumed that all of 
the temporal dependence in the model structure 
was rooted in the presence of the unit-specific 
effect (be it fixed or randomly distributed). That 
is, the only reason that a Y; response at time 
t would be correlated with a Y,; response at 
time f+ 1 is the presence of the stable U, term, 
which influences the response for the individ- 
ual unit at all time periods. Nothing else in the 
models considered thus far suggests a dynamic 
process involving time-dependence in either 
the core of the model or in the idiosyncratic 
error terms. The longitudinal nature of the data, 
in other words, has been used primarily as a 
means to rid the model of the effects of the nui- 
sance U; term, and not to model any kind of 
dynamic processes per se. Controlling for unob- 
served stable unit effects is highly important 
for panel analysis, but it is often insufficient 
to take into account all of the temporal depen- 
dence in the data. Thus much of panel analy- 
sis is devoted to alternative means of modeling 
temporal dependence, either instead of, or in 
addition to, the heterogeneity models we have 
considered thus far. 


3.1 Autocorrelated disturbances 


One kind of additional temporal dependence 
is caused by correlations between the idiosyn- 
cratic error terms s; of successive panel waves. 
These autocorrelated disturbances could be 
the result of exogenous random shocks to the 
system that persist for several time periods, 
omitted variables that change over time, or cor- 
related errors of measurement from one panel 
wave to the next. Such a model could be repre- 
sented as in (2) above with the additional stip- 
ulation that the error terms are autocorrelated, 
as in: 


Vit = A+B." Xyit +Bo Xo + “By Xie +U; + ei (9a) 


Ey = PEn_1 + Vz, Where v;, ~ N(0,02) (9b) 


This model is estimated in two stages. The first 
proceeds by estimating p through one of many 
available methods commonly used in time-series 
analyses (see Baltagi, 2005), and then weighting 
equation (9a) by the p estimate to produce amodel 
with the well-behaved error term v,,. In the sec- 
ond stage, either a fixed effects or random effects 
specification for the U; is assumed, with estima- 
tion on the transformed equation (a) proceeding 
as in the models discussed earlier, either with 
mean-differencing (fixed effects) or estimation of 
the variance components of the error term and “6- 
mean differencing” (random effects). One wave 
of observations is lost with the first-stage differ- 
encing procedure, so this model requires at least 
three waves of data. 


3.2 Lagged endogenous variable models 


An alternative model of temporal dependence 
in Y; is perhaps more prevalent in the panel lit- 
erature. In this model the Y,, response is deter- 
mined by a series of X variables, either at time 
t and/or at a lag of t—1, along with the lagged 
value of Y;, as in: 


Yip = O+B Vita + Bo Xait_1 + Bs Keita 
ee? By Xie + Sit (10) 


In this model, the lag value of Y (the “lagged 
endogenous variable”) has a direct effect on the 
value of Y at the next time point, along with 
effects specified from prior values of X as well. 
We may include contemporaneous levels of X 
in the model as well, but for now we focus 
on the lagged effects for both the X, and Y,_;. 
The model captures the temporal dependence 
of adjacent responses on Y neither through their 
joint relationship with some stable unobserved 
U term, nor through the autocorrelation of adja- 
cent unknown idiosyncratic errors ¢,,, but rather 
because of the direct influence of Y at a given 
point in time on subsequent responses. With 
some simple algebra it can be shown that the 
model is equivalent to predicting the change 
in Y from its prior value, with the coefficient 
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on lagged Y in the change-score version of the 
model being equal to (RB, — 1): 


Yit— Yited = + (By — 1)* Vita + Bo Xie 
+B a Mog Fee" By Xia +e, (11) 


This means that, so long as the lagged depen- 
dent (endogenous) variable is included on the 
right-hand side, there is no difference between 
the expression of the model in terms of “static” 
scores in Y or as a “dynamic” change-score 
model in AY. Further, the effects of the X; vari- 
ables are exactly the same, regardless of the 
specification of (10) or (11). 

The model of equation (10) or (11) differs 
from those we have considered thus far in sev- 
eral important ways. One is the absence of the 
unit-specific error term U;, meaning that it is 
assumed that all of the temporal dependence of 
responses over time is due to the causal mech- 
anism linking the lagged endogenous variable 
and the lagged (or contemporaneous) X; to Y at 
any given point in time. This assumption can be 
relaxed, however, and we will examine models 
that include both lagged Y and the unit-specific 
effects below. But the most fundamental differ- 
ence is the presence of lagged Y as a predictor 
in equations (10) and (11) in the first place, and 
the inclusion of this term that has generated a 
good deal of controversy in the panel literature. 

All agree that whenever there are strong sub- 
stantive reasons for assuming that prior values 
of a variable have a direct causal effect on its 
subsequent value, the dynamic model is entirely 
appropriate. For example, in models of attitude 
formation and change, it is often assumed that 
there is some natural “state-dependence,” such 
that attitudes are determined directly by their 
prior values unless disturbed by some exoge- 
nous shock. Economic models of wealth may also 
assume that an individual’s store of accumulated 
wealth causes subsequent levels through dif- 
ferent investment decisions, employment, and 
educational opportunities, and the like. These 
processes would stand in contrast to variables 
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that need to be “created anew” in each time 
interval, as, for example, behavior such as vot- 
ing or political participation where engaging in 
the activity at one point in time may not nec- 
essarily “cause” subsequent behavior. In those 
cases, there may be little theoretical justification 
for estimating the dynamic model. 

More controversial are the purely statistical 
reasons that have been advanced for includ- 
ing lagged Y in panel models. One is to serve 
as a proxy for unmeasured factors that lead to 
the response at both points in time, and many 
analysts argue that the heterogeneity models 
or autocorrelation models we have considered 
thus far deal more explicitly and more effec- 
tively with this problem (Allison, 1990; Liker, 
et al., 1985). Likewise, it has often been argued 
that including the lagged endogenous variable 
served as a statistical control for “regression to 
the mean” effects, whereby change in a vari- 
able is typically negatively related to its subse- 
quent value (as seen in the (8, —1) coefficient 
for Y, in equation 11). This occurs because high 
(low) initial values were likely to be the prod- 
uct of some random forces that are not likely 
to be repeated in subsequent observations. Oth- 
ers claim, however, that once random measure- 
ment errors are taken into account (with models 
that we consider below), regression to the mean 
is usually not sufficient to justify the inclu- 
sion of Y,_, in the model. These arguments are 
far from settled in the literature, but they do 
point to the need for researchers to consider 
carefully the “epistemological status” of the 
lagged endogenous variable (Arminger, 1987), 
and not enter it automatically in panel models. 


4 Structural equation panel models 


The dynamic specification of (10) is a com- 
mon starting point for panel analysis in the 
structural equation modeling (SEM) tradition. 
Instead of pooling the data across waves and 
estimating a single coefficient for each of the 
independent variables over the N*T units of 
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observation, structural equation methods spec- 
ify and estimate a system of equations, one for 
each dependent (endogenous) variable at each 
wave of observation. The overall model rep- 
resents the interrelationships across all of the 
waves between the exogenous variables, i.e., 
those determined outside of the causal system 
and taken as “given,” and the endogenous vari- 
ables, which are determined by other exoge- 
nous or endogenous variables in the model. 
In principle, each of the equations—i.e., the 
equation for Y in wave 2, Y in wave 3, etc.— 
could be estimated separately, but the SEM 
approach uses the information provided by 
the variances and covariances between all the 
observed variables in the model, even those that 
are not directly related to one another in any of 
the model’s equations, to allow the more effi- 
cient simultaneous estimation of all the model’s 
parameters. In panel models, this cross-wave 
covariation, along with the intrinsic temporal 
ordering between variables at earlier and later 
waves, provides the researcher great flexibility 
in estimating complex models with reciprocal 
causal linkages between variables, and, under 
some conditions, models that allow random 
measurement error in the observed indicators 
over time. These are the principal ways that the 
structural equation approach is used by panel 
analysts, with the dynamic specification of (10) 
most often at the core of these models. 

The SEM approach may be illustrated with 
an example of the relationship between an indi- 
vidual’s partisan identification (i.e., the direc- 
tion and strength of his or her attachment to 
the Republican or Democratic Party in the US), 
and presidential approval (i.e., whether he or 
she approves of the performance of the sitting 
President in office). In the political science lit- 
erature, controversy rages over whether party 
identification is a stable characteristic which 
influences shorter-term perceptions such as 
candidate evaluations, presidential approval, or 
whether such short-term factors instead deter- 
mine partisan change over time (e.g., Green 


and Palmquist, 1990). A structural equation 
model of these alternative processes is depicted 
in Figure 29.1, with the variables of partisan 
identification and presidential approval mea- 
sured in three waves of observation in the 
National Election Panel Study of 2000—2002-— 
2004. In order to facilitate the presentation 
through the chapter, the variables and coeffi- 
cients are labeled with the LISREL nomencla- 
ture which is widely utilized within the SEM 
tradition (J6reskog and Sdérbom, 1994; Kaplan, 
2000). 

The model shows that party identification 
and presidential approval measured in 2000 
are assumed to be exogenous variables, i.e., 
variables with causes outside the causal sys- 
tem, and are depicted as €, and &,. Party iden- 
tification and presidential approval in 2002 
and 2004 are endogenous (7, through 1,), pre- 
dicted by their own previous value and the 
previous value of the other variable.*® That is, 


Wa3 


& = Exogenous variable 

7 = Endogenous variable 

B, y= Regression coefficients 

© =Variance-covariances of the € exogenous variables 
w= Variance-covariances of the ¢ structural disturbances 
¢ = Structural disturbance of the n 


Figure 29.1 Three-wave, cross-lagged panel model 


All variables are coded in a “pro-Republican” 
direction so that high values on approval indicate 
either disapproval of a Democratic President (in 
2000) or approval of a Republican President (in 2002 
and 2004). 
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the model specifies a lagged effect from each 
variable on itself over time, and cross-lagged 
effects of party identification and presidential 
approval on each other. Each endogenous vari- 
able also has a random error term, depicted as ¢, 
through and ¢,. The structural effect linking the 
exogenous variables € to endogenous variable 
1 are labeled as y coefficients, and the struc- 
tural effects linking the endogenous variables 
to one another are labeled as 8. Following com- 
mon SEM practice, all variables are expressed 
as deviations from their mean, which eliminates 
consideration of the intercept in all of the struc- 
tural equations."® 

The four equations for the endogenous vari- 
ables may therefore be written as: 


Th = Ya181 + V2b2 +4 (12a) 
Ne = Ya181 + Yo2b2 + be (12b) 
Ns = BsiT + Bao M2 + Ss (12c) 
Na = Bart + BaoNe + Sa (12d) 


or in matrix form as 
n= Bynt+Té+C (13) 


where 

1 = a vector of m endogenous variables (in this 
case 4) 

B = an m by m matrix of B coefficients linking 
the m endogenous variables (here 4 x 4) 


167 ISREL and other SEM analysis packages do allow 
for the estimation of intercepts, and these kinds of 
models are widely used in the analysis of group dif- 
ferences (S6rbom, 1982) and in multilevel structural 
equation models (Muthén, 1994). 
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T= an m by n matrix of y coefficients linking 
the n exogenous to the m endogenous (4 x 2) 

¢ =a vector of m structural disturbances for the 
endogenous variables (here 4) 

Two other matrices are relevant from the 
point of view of specification and estimation: 
&, the n by n matrix of the variances and covari- 
ances of the exogenous variables (here com- 
prised of three distinct elements, the variances 
of €, and &, and their covariance, represented 
by the curved arrow labeled 6,,); and ws, the m 
by m matrix of the variances and covariance of 
the structural disturbances ¢ (here 4 x 4, with 
the diagonal elements representing each equa- 
tion’s error variance and additional covariances 
estimated between the structural disturbances 
for n, and n, at wave 2, and for n, and n, 
at wave 3). These error covariances represent 
the residual covariance between party identi- 
fication and presidential approval at a given 
panel wave that cannot be explained through 
the stability and cross-lagged effects in the 
model. 

Several features of this model are impor- 
tant to note. First, it can be seen that each 
of equations (12a) through (12d) is simply a 
version of the dynamic panel model of (10) 
with the lagged values of Y and X as inde- 
pendent variables. Thus the estimate for y,, 
in (12a) and 8,, in (12c) represent the “stabil- 
ity” or “autoregressive” effect of party ID in 
wave 1 or 2 on its own value in wave 2 or 
3; alternatively, subtracting 1 from these esti- 
mates will result in the effect of party ID at 
wave 1 or 2 on the subsequent change in party 
ID over the next panel wave. Similarly, the esti- 
mates y,, and B,, represent the lagged effect 
of presidential approval in wave 1 or 2 on 
subsequent values of party identification, or, 
equivalently, on the change in party identifica- 
tion over time. The corresponding coefficients 
for presidential approval and party identifica- 
tion in equations (12b) and (12d) have exactly 
the same interpretation in terms of the respec- 
tive variables on presidential approval or the 
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change in presidential approval over time. Thus 
the SEM setup here is designed to provide infor- 
mation on the relative stability of the two vari- 
ables, as well as on the relative magnitude of 
the two cross-lagged effects across the waves 
1-2 and waves 2-3 periods.” 

Second, the model as specified contains no 
contemporaneous reciprocal linkages between 
party identification and presidential approval 
at wave 2 or wave 3, i.e., there are assumed 
to be no effects between 7, and y, or between 
y; and y,. This means that the cross-lagged 
model in Figure 29.1 is recursive, in contrast to 
a nonrecursive model that would contain feed- 
back effects between variables observed at the 
same panel wave. The assumption of no con- 
temporaneous effects may not be justified, as 
in many panel analyses the length of time for 
X to affect Y may be significantly shorter than 
the time lag between waves of observations. 
Nevertheless, in the absence of strong theory 
the cross-lagged model is usually a satisfactory 
initial model.'® Nonrecursive models present 
more difficulties than recursive models in iden- 
tifying and estimating causal parameters, as will 
be seen below. 


7Tt is important to note that the “stability” repre- 
sented by the lagged dependent variable is not abso- 
lute stability in the sense of “no change,” but rather 
stability in the sense of relative rankings of cases 
over time. When the autoregressive coefficients are 
closer to 1, this indicates that units with higher val- 
ues at time (t) tend also to have higher values at time 
(t+1), though significant absolute changes may have 
occurred, either due to the effects of the other vari- 
ables in the model or an overall change that affects 
all units equally. 

18Jt is also the case that the cross-lagged model 
can be derived from the “continuous time” panel 
model where both X and Y exert continual influ- 
ence on one another over time, as opposed to having 
effects in discrete time intervals corresponding to the 
waves of measurement (Coleman, 1968; Tuma and 
Hannan, 1994). 


Structural equation panel models are esti- 
mated in the same way as all SEMs (see Bollen, 
1989; Kaplan, 2000). The variances and covari- 
ances between observed variables are expressed 
in terms of the unknown parameters y, B, w, 
and ¢, given the model’s assumptions. In this 
case we assume that all variables are expressed 
in mean deviation form, that the ¢ disturbances 
are unrelated to both the y and é that appear as 
independent variables in their respective equa- 
tions, and that there are no covariances between 
any of the ¢ disturbances. 

Second, it is determined whether the model 
as a whole, and individual equations within it, 
are identified, i.e., whether there is sufficient 
information in terms of the observed variances 
and covariances to produce unique estimates of 
each of the model’s parameters. All recursive 
models are either identified or overidentified 
(with more known variances and covariances 
compared than unknown parameters), though 
this will not be the case in the nonrecursive 
simultaneous effects models we will consider 
shortly. In the model of Figure 29.1, there 
are 17 unknowns—3 ¢ variances and covari- 
ances of the exogenous variables, 4 y linking 
exogenous to endogenous variables, 4 8 link- 
ing endogenous variables, and 6 variances 
and covariances of the endogenous variables— 
and 21 known variances and covariances. This 
means that the model is overidentified with 4 
degrees of freedom. When models are overiden- 
tified, there will be more than one solution for 
at least some of the model’s unknowns, and this 
additional information can be used to assess 
how well the model fits the data as a whole. 

Third, under the assumption of multivari- 
ate normality of the observed variables, max- 
imum likelihood methods are typically used 
to estimate the model parameters. Intuitively, 
the ML procedures find the estimates of the 
unknown parameters, which, taken together, 
minimize the difference between the implied 
and actual variance—covariance matrices (see, 
e.g., Kaplan, 2000, pp. 24—27 for more details). 
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Finally, the variance—covariance matrix that 
is implied from the estimated coefficients is 
compared to the observed variance—covariance 
matrix to make assessments of the fit of the 
model as a whole. If a model can reproduce the 
observed variances and covariance very well, it 
is “consistent” with the data. If a model cannot, 
it is “inconsistent” with the data. 

A variety of measures are available for assess- 
ing the significance of particular models in 
terms of their overall explanatory power and 
in terms of making comparisons between alter- 
native models with different unknown parame- 
ters. For example, the quantity n* log Ly, where 
L, is the likelihood function of the estimated 
model, is distributed as x”, with degrees of free- 
dom equal to the number of estimated parame- 
ters. This “model x*”, widely used as a measure 
of the overall fit of the model to the data with 
low values, relative to degrees of freedom, indi- 
cated better fit. Given the sensitivity of x* to 
sample size, a variety of additional measures 
have been proposed to assess the explanatory 
power of a given model versus alternative base- 
line models; e.g., the Normed Fit Index (NFD 
compares the model x? to the x? of a “complete 
independence” model with zero covariances 
among all variables in the population, while 
the Parsimony Normed Fit Index (PNFT) penal- 
izes the NFI to the extent that the estimated 
model includes more and more parameters. 
See Hu and Bentler (1995) for a comprehen- 
sive overview of these issues. Finally, when 
models are “nested,” such that one model can 
be defined by relaxing constraints on param- 
eters in another model, the difference in y? 
between the two models provides a test of 
the significance of the improvement in fit 
between the unconstrained versus constrained 
models. 

The ML estimates of the cross-lagged model 
of Figure 29.1 for the NES panel data (N=738) 
are shown in Table 29.2. The model shows 
that there are significant cross-lagged effects 
in both directions between party identifica- 


Linear panel analysis 491 


tion and presidential approval, with the magni- 
tude of the party-approval effect being approxi- 
mately three to four times larger as the reverse. 
The model shows strong stabilities for the party 
and variable and weak stability for the approval 
of the President, especially between 2000 and 
2002, which may be expected due to the change 
in Presidential administration (despite the cod- 
ing changes described in footnote 12). Nev- 
ertheless, the model shows some support for 
the “revisionist” notion that short-term political 
evaluations influence the intensity of an indi- 
vidual’s identification with particular political 
parties in the US. The estimates, however, are 
in the context of a poorly-fitting model, with 
a large and significant x” and a relatively low 
Parsimony Normed Fit Index of .26. 

Column (2) shows the flexibility of the SEM 
approach in terms of constraining particu- 
lar parameters to be equal in order to test 
the relative explanatory power of alternative 
model specifications. In this model, the cross- 
lagged effects between party and approval from 
waves 1—2 and 2-3 are each constrained to be 
equal, thus gaining 2 degrees of freedom in 
the process. The results show nearly identi- 
cal parameter estimates to model (1), and the 
difference in the two models’ x? is only .27, 
which, with 2 degrees of freedom difference, 
indicates that the unconstrained model does not 
fit the data significantly better than the con- 
strained model. The PNFI shows a correspond- 
ing improvement to .39, reflecting the nearly 
equal explanatory power of this model with 
fewer estimated parameters. In practice, panel 
analysts will estimate a variety of alternative 
models, usually imposing equality constraints 
at the outset and relaxing them as necessary 
as indicated by x? tests and other information 
about model fit. In this case, the overall indices 
show that neither of the models estimated pro- 
vides a particularly good fit to the data, meaning 
that additional parameters, perhaps in the form 
of synchronous casual effects, may need to be 
included. 
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Table 29.2 Cross-lagged reciprocal effects models, party identification and presidential approval, American 
National Election Study, 2000—2002-—2004 


(1) (2) (3) 
No equality Equality Measurement 
constraints constraints error in party 
identification 
Stability effects 
Party identification, wave 1-2 y,, .80 81 .97 
79 .80 .96 
Party identification, wave 2-3 B,, .88 .88 1.04 
83 82 .96 
Presidential approval, wave 1-2 y., 15 14 .09 
.16 15 .10 
Presidential approval, wave 2-3 B,, 56 57 52 
49 49 44 
Cross-lagged effects 
Party to approval, wave 1-2 y,, 31 31° oe 
43 43 49 
Party to approval, wave 2-3 By, 31 31% 37° 
37 37 43 
Approval to party, wave 1-2 y,, 17 16> .03>* 
:13 .12 .02 
Approval to party, wave 2-3 B,, 15 16> .03>* 
.10 «17 .02 
Error covariances 
Wave 1 ®, 2.25 2.25 2.26 
62 62 66 
Wave 2 ,, 45 45 .29 
13 13 .09 
Wave 3 4, 44 44 i25 
.10 .10 .06 
Measurement error variance .50 
Party identification e, ---- ---- 
R-squared, party identification, wave 2 77 77 95 
R-squared, party identification, wave 3 .79 .79 95 
R-squared, presidential approval, wave 2 .29 .28 31 
R-squared, presidential approval, wave 3 .60 .60 62 
x’ (degrees of freedom) 133.33(4) 133.60(6) 10.91(5) 
Normed fit index (NFI) .97 .97 1.00 
Parsimony normed fit index .26 .39 33 


All variables statistically significant; standardized coefficients italicized. 
Coefficients a, b constrained in models (2) and (3) to be equal. N = 738. 
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4.1 Alternative lag effects 


Given enough waves of observation and enough 
knowns in the form of observed variances and 
covariances, the SEM approach allows consid- 
erable flexibility in specifying alternatives to 
the cross-lagged causal models of Figure 29.1. 
One possibility is a model with synchronous, 
or simultaneous impact of variables on one 
another at a given wave of observations. Such a 
model would be appropriate if the time lag for 
the causal influence of the independent vari- 
able is thought to be short, relative to the time 
period between observations. For example, in 
models of the impact of interpersonal networks, 
if the panel waves were separated by five years, 
a cross-lagged model may not capture network 
impact well, as its effect may either have dis- 
sipated in the intervening time period or the 
network itself may have changed since the ini- 
tial panel wave, or both. In that case, a more 
accurate depiction of the causal process may 
be to include B effects between y, and ,, and 
between y, and , instead of the cross-lagged 
effects linking the variables from wave 1 to 
2, and wave 2 to 3. It is also possible that 
both short- and longer-term causal lags could be 
present, in which case the additional B effects 
would be included along with the cross-lagged 
processes already specified. 

In either case, the inclusion of synchronous 
causal effects yields a nonrecursive causal 
model whose estimation is significantly more 
complicated than in the recursive, cross-lagged 
only case. This is so for two main reasons. First, 
in synchronous effects models the assumption 
of no correlation between independent vari- 
ables and the error term in their respective 
equations is untenable, meaning that the meth- 
ods we have considered thus far would pro- 
duce inconsistent estimates of the 8. Imagine 
hypothetical causal arrows between y, and n, 
in Figure 29.1, with 8,, representing the effect 
of party ID on approval in 2002, and B,, rep- 
resenting the reciprocal link between approval 
and party ID. It can immediately be seen that , 


Linear panel analysis 493 


is related to ¢,, since ¢, causes y, which causes 
y,- Similar processes result in a nonzero cor- 
relation between yn, and ¢,, meaning that we 
cannot consistently estimate either 8,, or B,.— 
or any of the other effects in their respective 
equations—with the information at hand. 

Second, the inclusion of reciprocal syn- 
chronous effects raises the possibility that the 
model as a whole, or some of the individual 
equations, are not identified. For example, in 
a bivariate cross-section model with reciprocal 
causal effects, the model would not be identi- 
fied because there would be four unknowns— 
the two B and the two structural disturbances— 
and only three known variances and covari- 
ances with which to estimate the unknown 
parameters.’? 

What is needed in the case of nonrecur- 
sive models in general, and synchronous effects 
panel models in particular, is more informa- 
tion in the form of additional observed vari- 
ances and covariances. More information, of 
course, generates additional “knowns” so that 
the counting rule is more likely to be satisfied. 
But the additional variables must be related to 
the included variables in specific ways in order 
to be of use in solving the estimation problems 
posed in nonrecursive models. Specifically, 
what are needed are variables that can help sat- 
isfy the so-called order condition for identifica- 
tion, which states that if an equation involves 
m endogenous variables, then there must be 
at least (m—1) excluded exogenous variables 
for the equation to be identified. That is, the 
equation must have at least one € that does 
not have an effect on the y endogenous depen- 
dent variable in question for every endogenous 
independent variable that does. For example, 


19Tn that case, the model as a whole would not satisfy 
what is referred to as the counting rule for identifica- 
tion, whereby (m+n)(m+n-+1)/2 >= k, with m and 
n representing the number of exogenous and endoge- 
nous variables, and k representing the number of free 
parameters. 
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Figure 29.2 An instrument variable (&,) in a 
nonrecursive causal model 


in Figure 29.2 we add a variable €, to the 
equation predicting 7, in a reciprocally-related 
bivariate cross-sectional model. Following the 
order condition, &, identifies the n, equation, as 
there is now one excluded exogenous variable 
from the equation and one included endoge- 
nous variable (y,). Given that no exogenous 
variable is excluded from the 7, equation, it is 
still not identified. Thus identification depends 
on the availability of additional exogenous vari- 
ables that affect one and only one of the two 
reciprocally-related variables in a synchronous 
effects model. 

The excluded exogenous variable, also called 
an instrumental variable, is then used to esti- 
mate the model parameters consistently. In the 
simplest case, such as depicted in Figure 29.2, 
the assumptions regarding €, namely, that it 
is uncorrelated with the ¢ and it has no direct 
causal effecton 1, allow the expression of B,, as: 


Covariance(é,, 7.) /Covariance(é,, 1,) 


with the covariance between y, and the struc- 
tural disturbance ¢, no longer contaminat- 
ing the estimate of the £,, causal effect. 
In more complex cases, there may be more 
than one instrument available for inclusion 
in the model, in which case estimation pro- 
ceeds through “two-stage least squares” (TSLS) 
methods or through the general maximum 


likelihood procedures discussed above. In the 
widely-utilized TSLS setup, the endogenous 
independent variable is first regressed on all 
exogenous variables, including all of the instru- 
ments, generating a “predicted y”’ which is 
uncorrelated with all of the model ¢. Then, 
in the second stage, the dependent variable is 
regressed on the exogenous independent vari- 
ables and the “predicted 1” from the first stage 
(with appropriate corrections to the standard 
errors in the second stage). 

All of these procedures, however, depend on 
the satisfaction of the assumptions of instru- 
mental variable analyses. That is, there must 
be variables included in the model that affect 
one and only one of the two reciprocally-related 
endogenous variables, and these variables must 
be exogenous in the sense of unrelated to the 
structural disturbances of the endogenous vari- 
ables. These variables are difficult to find in 
many practical research situations. In panel 
designs, however, it may be reasonable under 
some conditions to assume that the lagged val- 
ues of variables are exogenous and related to the 
endogenous variables in such ways as to facil- 
itate identification and estimation. For exam- 
ple, consider a “pure” two-wave synchronous 
effects version of Figure 29.1 with no cross- 
lagged effect from either €, to y, or from &é, 
to n,. In that case €, would be used to iden- 
tify the y, equation and, similarly, €, would 
be used to identify the y, equation. Thus, if 
the assumptions of exogeneity can be justified, 
the longitudinal structure of panel data can 
provide additional information in the form of 
instrumental variables for purposes of identi- 
fying and estimating nonrecursive, reciprocal 
effects models. 

In many instances, however, the situation 
will be complicated by the violation of the exo- 
geneity assumptions. If there are autocorrelated 
disturbances present in the model, for exam- 
ple, then the lagged value of a variable will 
not be unrelated to the structural disturbance 
of the endogenous variables and hence will not 
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be a suitable instrument. Further, in models 
with both cross-lagged and synchronous effects, 
the lagged value of a variable, even if unre- 
lated to the structural disturbances, is assumed 
to have effects on both endogenous variables 
at subsequent waves of observation, and so 
cannot be used to identify the subsequent 
wave’s equations. One possible solution would 
be to include additional exogenous variables 
as instruments, provided that they satisfy the 
exclusion restrictions discussed above. Another 
common possibility is to identify the model 
through the use of equality constraints, such 
that the cross-lagged effects may be assumed to 
be equal from waves 1 to 2 and 2 to 3, the syn- 
chronous effects may be assumed to be equal in 
waves 2 and 3, and perhaps the stability effects 
equal across waves as well. Given that some of 
these models will be nested within each other, 
comparison of chi-square goodness of fit mea- 
sures can provide some insight into the models 
that are most consistent with the observed data; 
with that information, along with the param- 
eter estimates found for the various models, 
the analyst may arrive at conclusions regard- 
ing the likely pattern of lag causal effects 
between variables over the course of the panel 
observations. 

In the current example, we re-estimate model 
(2) in Table 29.2 by including both synchronous 
and cross-lagged effects between party and 
approval, and specifying equality constraints 
between the two sets of contemporaneous 
effects. The results show that the cross-lagged 
effects are still significant in both directions, 
while neither contemporaneous effect is signifi- 
cant, with estimated values of .001 from party to 
approval and —.05 from approval to party. The 
model y” is 133.3 with 4 degrees of freedom, 
thus not a significant improvement from the 
cross-lagged only model in Table 29.2, given the 
loss of 2 degrees of freedom. We conclude that 
the cross-lagged model is superior to a model 
with both cross-lagged and synchronous effects, 
though neither model fits the data well. 
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4.2 Measurement error 


The SEM approach to panel analysis is also 
often extended to include the estimation of 
causal effects, controlling for errors of mea- 
surement in the variables of interest. It is 
well-known that random measurement in inde- 
pendent variables causes estimation of regres- 
sion coefficients to be biased, downward in the 
case of bivariate equations and in unknown 
directions in multivariate models (Bollen, 
1989). In cross-sectional analyses, there is often 
not enough information to identify and estimate 
both the structural effects and the measure- 
ment error that may be present in the model, as 
there needs to be multiple observed indicators 
of the presumed “latent” (error-free) variable of 
interest in order to proceed. With panel data, 
though, the information that is provided from 
the same variables over time allows much more 
flexibility in estimating structural effects once 
measurement error is taken into account, and 
in estimating and assessing the measurement 
properties of particular indicators as well. In 
the LISREL and other applications of the SEM 
framework, all of these effects are estimated 
simultaneously, providing powerful additional 
tools to the panel analyst in strengthening the 
causal inference process. 

Figure 29.3 shows a three-wave autoregres- 
sive panel model that includes a random mea- 
surement component. The model depicts each 
wave’s indicator of y as a function of an unob- 
served latent “true” score y plus a random mea- 
surement component e. In equation form, the 
model is written in two parts as: 


Vit =Are Nje+& “Measurement Model” (14a) 


Nit =Ber-1 Nit “Structural Model” (14b) 


with the s assumed to be normally distributed 
random variables, uncorrelated with the y and 
the structural disturbances ¢, and, in basic mod- 
els, uncorrelated with each other over time. The 
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Figure 29.3 Three-wave autoregressive model 
with measurement error 


model differs from (13) because of the presence 
of random measurement error in the observed 
variables, which was assumed (unrealistically) 
to be zero in all of the models considered to 
this point. Expressing, for example, the struc- 
tural portion of the model from wave 1 to wave 
2 in terms of the fallible indicators shows the 
consequences of this omission, setting \ = 1 for 
simplicity: 


Voi = Bor Yi — Bar € 13 + Sa; + 82; (15) 


As can be seen, the regression of y, on y,, 
with both variables containing random error, 
will have a larger error variance than the true 
structural disturbance ¢,, with a correspond- 
ing lower R-squared and inefficient estimates 
of the model coefficient’s standard errors; even 
more consequential is that estimates of B,,, the 
autoregressive or “stability” effect in the model, 
will be inconsistent, as y, is intrinsically related 
to the error term in equation (15) due to the 
presence of ¢,. 

This is an important result, showing that 
measurement error in the basic dynamic panel 


model yields an incorrect estimate of the sta- 
bility parameter 8B, often lower than its true 
value. And to the extent that other indepen- 
dent variables are (positively) related to both y, 
and y,, their effect in equation (15) is likely to 
be biased upwards, leading to potentially erro- 
neous conclusions about their effects on y, or 
Ay. Thus, it is essential to take measurement 
error into account in panel models, and failure 
to do so is likely to negate many of the advan- 
tages of panel analysis in estimating dynamic 
causal processes. 

Given the statistical nature of the problem, 
ie., the correlation between the independent 
variable y, and the error term in equation (15), 
one solution would be to use instrumental vari- 
able analysis. If an exogenous variable é exists, 
such that it was uncorrelated with both the mea- 
surement error in y and the structural distur- 
bance ¢, then estimation could proceed via the 
TSLS or the ML procedures considered thus far. 
In the first stage, y, would be regressed on all 
exogenous variables, including the instrumen- 
tal variable(s), and in the second stage, y, would 
be regressed on all independent variables in its 
equation, along with the predicted value of y, 
from the first stage, which would be purged of 
the correlation with its error term. There are two 
drawbacks to this approach. One is practical, in 
that it is commonly difficult to find exogenous 
variables that satisfy the exclusion restriction 
discussed above, i.e., that are related to a fal- 
lible indicator in one wave of observation but 
not the next. 

Second, the IV solution, even if successfully 
implemented, does not provide information 
about the measurement properties of indicators 
in the model, which may often be of consid- 
erable interest. In terms of the measurement 
model of (14a), we may wish to know the “reli- 
ability” of y, defined as the quantity 


Variance(n)/Variance(y) 


or the proportion of the observed variance that 
is comprised of “true score” variance. This 
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information may be especially useful in mul- 
tiple indicator models, i.e., those where more 
than one indicator for a given latent construct is 
available. In that case, assessing the reliabilities 
of the specific indicators, both absolutely and 
in relation to one another, provides additional 
information that can be used to determine the 
adequacy of the indicators or their suitability as 
measures of the underlying construct. 

As the amount of information available 
increases, in terms of waves of observation 
and number of indicators of latent variables, 
panel models with measurement error may be 
identified and estimated with fewer and fewer 
restrictive assumptions. In the model of equa- 
tions (14a) and (14b), for example, we have 
six observed variances and covariances, and 11 
unknowns—the three X, the three variances of 
the measurement error s, two B, and three vari- 
ances of the structural disturbances ¢. We may 
make some progress in setting all of the \ to 
equal 1; this has no substantive bearing on the 
model and simply puts the latent variable on 
the same measurement scale as the observed 
y indicators.2° This leaves 8 unknowns and 6 
knowns in the model. Wiley and Wiley (1970) 
propose that identification be achieved in this 
case by constraining the variance of the mea- 
surement error term ¢ to be equal across the 
three waves, gaining 2 degrees of freedom in 
the process as well. Under these assumptions, 
the model is just-identified and the relevant 
parameters can be solved through algebraic 
manipulation of the observed indicators’ vari- 
ances and covariances, as: 


var(e) = var(y,) — [Covariance 


(Y3¥2)Covariance(y,,y,)/ 
Covariance(y,,y;)| (16a) 


2°In multiple indicator models, one indicator’s is 
set to 1 for the same reason, while the other \ are 
free to vary. 
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B., = Covariance(y,, y,)/[Variance(y,) 
— Variance(s)] (16b) 


B,. = Covariance(y,, y,)/Covariance(y,, y.) 


(16c) 


Once these manipulations are accomplished, it 
is straightforward to solve the reliabilities of the 
indicators, given the observed variance (y,) = 
var(n,) + var(s).77 

More generally, the maximum likelihood 
estimation of measurement error and _ struc- 
tural coefficients may proceed simultaneously 
within the LISREL or other applications of 
the SEM approach. The measurement equa- 
tions in (13a) would be summarized in matrix 
form as: 


y=Aynt+e (17a) 


and, if measurement error were assumed to be 
present in the exogenous € variables, then 


K=AE+S (17b) 


with the A matrices being (q x m) and (p x n) 
matrices of the \ linking the q indicators (y) of 
the m endogenous variables and p indicators (x) 
of the n exogenous variables, and € and 6 being 
(q x 1) and (p x 1) vectors of the measurement 
errors in y and x, respectively. The implied 
variance—covariance matrix that expresses the 
knowns in terms of the unknown model param- 
eters can then be expanded to include the 


21The single-indicator, three-wave model may also 
be identified through the Heise (1969) procedure. In 
this model, the latent and observed variables are stan- 
dardized, so that the unknowns are the two B stabili- 
ties and the three \ coefficients, which represent path 
coefficients linking the latent variables and observed 
indicators (and thus \? represents the reliability coef- 
ficient). Under the assumption of equal reliabilities 
across waves, the model is just-identified. 
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unknowns in AGA, and the variances of the s 
and 6, denoted as 6, and 0,, respectively (see 
Kaplan, 2000, p. 56). Maximum likelihood esti- 
mation then proceeds as before, assuming mul- 
tivariate normality for all the observed variables 
(and assuming that appropriate constraints are 
imposed when necessary to achieve parameter 
identification). 

There are several options for incorporat- 
ing the information regarding measurement 
error into full-blown cross-lagged or syn- 
chronous structural equation models. In three- 
wave single-indicator models, one method is to 
fix the measurement error variances of the indi- 
cators at the values that are obtained through 
the Wiley-Wiley procedure, with the structural 
effects then estimated while correcting for the 
unreliability of the indicators. Another method 
is to allow the measurement error variances to 
be completely free parameters, with LISREL or 
alternative programs producing simultaneous 
estimates of measurement and structural coeffi- 
cients in the model. 

This procedure is illustrated in model (3) 
of Table 29.2, which shows a reanalysis of 
the cross-lagged panel model of Figure 29.1 
while allowing for measurement error in the 
party identification equation. As can be seen, 
the stability of the party variable rises consid- 
erably compared to the previous estimation, 
and neither cross-lagged effect from approval 
to party identification now is statistically sig- 
nificant or substantively meaningful, with stan- 
dardized values of .04 or less. By contrast, the 
cross-lagged effect from party to approval is 
somewhat larger than in the no-measurement- 
error model, and the overall model y? is much 
improved (10.91 with 5 degrees of freedom, 
compared with 133.6 with 6 degrees of free- 
dom in the previous model). Thus interpreta- 
tion of the causal effects in panel models may 
be substantially altered once measurement error 
is taken in to account; in this case the mea- 
surement error model shows results that are 
much more in accord with traditional views of 


party identification as the “unmoved mover” 
of short-term political evaluations (Green and 
Palmquist, 1990). And in this case, the relia- 
bilities of the party identification measures are 
calculated as approximately 90%, meaning that 
only 10% of the indicator variance was esti- 
mated to be “error;” when indicators exhibit 
less reliability, then differences between mea- 
surement error models and models assuming 
perfect measurement will be even greater.” 

It should be emphasized, however, that the 
measurement models estimated thus far depend 
on their own set of assumptions that need to 
be justified. For the Wiley-Wiley procedure, for 
example, it must be assumed that the error vari- 
ances are equal over time for the three-wave 
panel model to be identified. This may be unre- 
alistic, and work conducted with longer-term 
panels suggests that error variances tend in fact 
to shrink over time (Jagodzinski, et al., 1987). 
In the present case, moreover, the Wiley-Wiley 
method failed to produce credible estimates 
in the case of presidential approval, indicat- 
ing that the assumptions in the model were 
unlikely to be satisfied. 

In such instances, the analyst may simulate 
results by plugging in different values of error 
variances, assuming perhaps some degree of 
constant shrinkage over time. But the more 
promising solution, as always in measurement 
error models, is whenever possible to add 
waves of observation and/or additional indica- 
tors for the latent constructs. With four waves 
of data, the assumption of equal error variances 
can be relaxed such that the measurement errors 
of the “inner” indicators y2 and y3 are identi- 
fied without constraint; moreover, the stability 
effect from wave 2 to wave 3 (8,,) is overidenti- 
fied, thus indicating that the fit of the model as 


2For example, the observed variance of y,, party 
identification in 2002, is 4.84. Given the estimated 
error variance of .5, this yields a reliability estimate 
of (4.84 —.50)/4.84 = .90. 
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a whole can be assessed. With longer-wave pan- 
els, more parameters will be identified without 
the restrictive constraints of the Wiley-Wiley 
procedure. And when multiple indicators are 
available, models that incorporate autocorrela- 
tion between the errors of measurement over 
time may also be estimated, with fewer con- 
straints necessary as the waves of observation 
and number of indicators increase. 


5 Dynamic panel models with 
unobserved heterogeneity 


A natural extension of the models that we 
have been considering thus far is to incorporate 
both dynamic causal processes (and potentially 
measurement error) along with unobserved het- 
erogeneity into a single model. Both the econo- 
metric and the SEM panel traditions provide the 
ability to incorporate and estimate these kinds 
of models. 

The dynamic model with heterogeneity takes 
the following general form: 


Yip = +B, Vit_-1 + Bo Xrit + Bs Xait 
+-+-BP Xi, +U; + Fi (18) 


with 8, representing the effect of the lagged 
endogenous variable Y,_, and U;, as before, rep- 
resenting the unit-specific effect. It can be seen 
that the model combines equation (2), the basic 
model for unobserved heterogeneity, with equa- 
tion (10), the basic dynamic model with con- 
temporaneous effects of the Xs, though lagged 
values of X could also have been included in 
the model.”* The combined model thus corre- 
sponds to a situation where, for reasons dis- 
cussed earlier, the lagged endogenous variable 
is thought to exert direct causal influence on its 
subsequent values, and there are stable unob- 
served differences between the units that push 


23 As will be seen, in the SEM version of the model, 
the lagged values are normally the values that are 
included in order to ensure model identification. 
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individual cases higher or lower on the depen- 
dent variable, aside from the included Xs and 
aside from the dynamic processes represented 
by the effect of Y,_,. 

The inclusion of both kinds of effects in the 
same panel model results in some estimation 
difficulties, however. The problem stems from 
the fact that Y,_,, the lagged endogenous vari- 
able, is intrinsically related to the composite 
error term of (18) due the presence of U,. This 
can be seen by lagging equation (18) by one time 
period, as in: 


Vita =a ah Bi Yit-2 F Bo" Xait_1 olf Bs" Xoit1 
+++ BPX ag + Ut bi (19) 


which shows the direct dependence of Y;,_,; on 
U,, and hence the bias produced by traditional 
methods of estimating (18). Moreover, one pop- 
ular method for “sweeping out” the unit effect, 
the “fixed effect” transformation of equation (4), 
fails to correct the problem, as the solution 
for eliminating U, from consideration produces 
a transformed error term of (¢;—&;). Since &; 
contains some portion of s;_,, then the lagged 
endogenous variable (Y,_,) is still related to the 
mean-differenced error term, and thus in rela- 
tively short panels the biases in estimating the 
dynamic effects in this model still exist.”* 

The solution to the problem lies in an adap- 
tation of the first-difference (FD) model consid- 
ered above in (6): 


Yie— Yer =O +B, (Vi — Yir-2) 
+B" (Kit —Xait-1) + Bs" (Bait —Xait-1) 
+++-B(Xjit — Xie) + (Et — Sit) 
(20) 


24s T > oo, &; > 0, and hence the error term in (18) 
would no longer be related to Y;_, so long as no auto- 
correlation is present in s;. But with short panels, 
bias on the order of 1/T exists. See Nickell (1981). 
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In this model, the U; have been differenced out 
of the equation, with the resulting error term 
the difference between the idiosyncratic error 
at times t and t—1. It can be seen, however, 
that the differenced lagged endogenous variable 
(Yi, — Yu_2) will still be related to the differ- 
enced error term, given the presence of e,_, in 
the latter. 

What are needed are instrumental variables 
that are related to the differenced lagged 
endogenous variable but unrelated to the dif- 
ferenced error term, and various candidates 
have been proposed in the literature. Ander- 
son and Hsiao (1982) suggest several possible 
instruments, one being the twice-differenced 
lagged endogenous variable (Y;,_,—Yj_3), and 
the other being the Jevel of Y;,_,; both proposed 
instruments are unrelated to the error term in 
(20).2° One drawback to the former solution, 
however, is that it requires at least four waves of 
data, and subsequent work also suggests that the 
twice-differenced lagged endogenous variable 
is often a poorly-performing instrument in that 
itis usually only weakly related to (Yy_, — Yy_2). 

The Arellano-Bond solution rests on the fact 
that the panel structure of the data provides 
more and more potential instruments in equa- 
tion (20) as the number of waves of observa- 
tion increases (Arellano and Bond, 1991). For 
three-wave data, Y,,_, may be used as an instru- 
ment for (Y;,_,—Y;_2), for four waves of data 
Yit-2, Yu_-3 and (Yy_.—Y_3) may be used, for 
five waves of data Y;,,, Yi_3, Yi, and all the 
respective changes scores may be used, and so 
on. So as one moves through the panel more 
and more instruments are included to arrive 
at more precise estimates of the dynamic and 
other effects in the model. In the Arellano-Bond 
formulation, the various lagged levels and dif- 
ferences of the exogenous X variables are also 


*5This strategy would not have worked with the fixed 
effect transformation, as Y;_, would still be related 
to the portion of the error term é; which contains 


Eit-2- 


included as instruments for the same reason. 
One drawback to the method is its inapplicabil- 
ity when there is serial correlation in the origi- 
nal equation’s (18) idiosyncratic error term, and 
the need for at least four waves of data to test 
this assumption.”© 

The strategy of using the panel structure of 
the data to find suitable instruments is also 
employed in econometric models that allow for 
reciprocal causality and measurement error. As 
has been discussed above, the statistical prob- 
lem that results from either reciprocal causal 
effects specification, or from the presence of 
measurement error in variables, is an intrinsic 
correlation between the independent variable(s) 
and the structural disturbance term for that vari- 
able’s equation. Hence some of the independent 
variables X in either (2), the original unobserved 
heterogeneity model, or the dynamic model (18) 
must be treated as endogenous. This leads nat- 
urally to the application of instrumental vari- 
ables analyses in the context of first difference 
or fixed effects models, using lagged values of 
the independent variables as instruments under 
certain conditions (see Halaby, 2004, pp. 532- 
535; Woolridge, 2002, Chapter 12). 

Dynamic models with unobserved hetero- 
geneity may also be estimated within the SEM 
framework, though applications of this kind 
are less common in the literature (Dorman, 
2001). An example of such a model is shown 
in Figure 29.4, with the basic features being the 
dynamic cross-lagged reciprocal effects specifi- 
cation considered earlier, along with an addi- 
tional exogenous variable (&,) that represents 
the individual-specific effect for each unit. 

The unit-effect corresponds to the U; term 
in equation (18), or, if no dynamic processes 
are specified, to the U; term in the basic unob- 
served heterogeneity model of equation (2). It 


*6See Wawro (2002) for discussion of alternative 
dynamic panel estimators, and application of these 
procedures to the controversy regarding the endo- 
geneity of political party identification. 
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Figure 29.4 Three-wave cross-lagged model 
with stable unobserved variable (&,) 


is assumed to be stable over time (and thus can 
be represented by a single &), and it is assumed 
to be related to each of the other latent vari- 
ables in the model as well. The unit-effect is 
specified within the SEM framework as an addi- 
tional latent variable with no observed indica- 
tor and with variance set arbitrarily to 1—i.e., 
it is a “phantom variable” that is identified 
only if there is enough other excess informa- 
tion from the observed variances and covari- 
ances in the model. In the present case, the 
model is identified so long as some equal- 
ity constraints are placed on the coefficients 
in the equations for the wave 2 and wave 3 
variables, e.g., equal cross-lagged effects, equal 
stabilities, or equal error variances. Of course, 
alternative models that impose all of these con- 
straints may also be estimated and compared. 
In the present example, an unobserved variable 
model linking party identification and presi- 
dential approval over time shows an excellent 
fit to the data (chi-square of 3.66 with 3 df), 
and the results indicate that there are no sig- 
nificant cross-lagged relationships between the 
two variables in either direction. 

There are several attractive features of the 
SEM version of this model. First, the ana- 
lyst may make use of the full range of 
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SEM-related procedures for incorporating mea- 
surement error into the analysis, as the iden- 
tification of the structural and measurement 
portions of the model are largely independent. 
Indeed, comprehensive models of this kind may 
often be necessary to estimate, as it otherwise 
may be unclear whether the unobserved vari- 
able represents the stable “unit-effect” or sim- 
ply a latent variable that represents one or 
more of the observed constructs purged of mea- 
surement error. Second, alternative unobserved 
variable models may be tested and compared 
in terms of their ability to account for the 
observed data. This is especially useful in that 
a model with the unit-effect being related to 
only the endogenous variables over time will be 
nested within a model that has the unit-effect 
related to both endogenous and exogenous vari- 
ables.?” In this way the covariance structure 
analysis can provide a statistical test of the 
random effects versus fixed effects specifica- 
tion of the unobserved heterogeneity model (see 
Teachman, et al., 2001, for further development 
of this model). Finally, with enough waves of 
observation, more elaborate unobserved vari- 
ables models may be specified, some that allow 
the unobserved variable to change over time 
(Dorman, 2001). In these ways the full power 
and flexibility of the SEM approach can be 
used to estimate models providing comprehen- 
sive attempts to overcome the most significant 
threats to successful causal inference in nonex- 
perimental research. 


6 Conclusion 


This chapter has outlined two approaches to 
panel analysis, one focusing on the prob- 
lem of unobserved heterogeneity, and the 
other focusing on dynamic causal processes 


27That is, in one model the covariances between 
the unit effect and any other € would be fixed at 
zero, while in the other they would be estimated 
parameters. 
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and measurement error. Models incorporat- 
ing unobserved heterogeneity were described 
mainly in the context of pooled econometric- 
type estimation, while the dynamic mod- 
els with measurement error corrections were 
described mainly in the context of structural 
equation modeling procedures. As mentioned, 
however, recent work within the two traditions 
has resulted in a greater convergence of mod- 
els and analytic strategies. This convergence is 
likely to continue, as newer developments in 
the field are even more explicitly synthetic in 
their approach. 

For example, the recent work of Skrondal and 
Rabe-Hesketh (2004) incorporates all manner of 
latent variables, from unobserved heterogeneity 
to “true score variables” purged of measurement 
error in their indicators, to latent responses that 
represent missing values of partially-observed 
variables, into a single analytical framework. 
More generally, the realization that panel and 
“multilevel” data share the same logical struc- 
ture (as the observations over time are nested 
within individual units) has led to the devel- 
opment of models that incorporate intratempo- 
ral growth processes at the “lower” level that 
may also vary randomly at the “higher” level. 
Such models incorporating randomly-varying 
intercepts and randomly-varying slopes may 
be estimated either through an extension of 
the random effects model discussed above, or 
with structural equation methods that treat the 
intercepts and slopes as “latent” variables that 
vary randomly across units (Bollen and Curran, 
2005; Singer and Willett, 2002; Chapters 33-35 
of this volume). Future developments in linear 
panel analysis, then, promise to build on the 
approaches presented in this chapter, to syn- 
thesize and to extend them in important new 
directions. 
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Data and software 


STATA 9.0 was used to estimate the pooled 
“econometric”-type models in the first section 
of this chapter. This software is available at: 
http://www.stata.com/. Many other software 
packages can be used for these models as well, 
including Mplus (http://www.statmodel.com/), 
SAS(http://www.sas.com/), MLwiN  (http:// 
www.cmm.bristol.ac.uk/), SPlus (http:// 
www. insightful.com/products/splus/default. 
asp), and LIMDEP (http://www.limdep.com). 

LISREL 8.72 was used to estimate the struc- 
tural equation models in the second section 
of the chapter. This software is available 
at: http://www.ssicentral.com/lisrel/index.html. 
Other popular packages available for this 
kind of analysis are Mplus (http://www. 
statmodel.com/), SAS (PROC CALIS) (http:// 
www.sas.com/), SPSS (AMOS) (http://www. 
spss.com/amos/), EQS  (http://www.mvsoft. 
com/), MX (http://www.vcu.edu/mx/) and 
Smart Plus (http://www.smartpls.de/forum/). 

The data used to estimate all models in 
this chapter can be found at www.pitt.edu/ 
~finkel/data.htm. 
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Chapter 30 i 


Panel analysis with logistic regression 
Scott Menard 


There is an extensive and well-developed lit- 
erature on panel analysis with continuous or 
quantitative variables, measured on an inter- 
val or ratio scale, with models for panel data 
that include approaches based on pooling lon- 
gitudinal and cross-sectional data, fixed effects 
and random effects models, and _ structural 
equation modeling; see, for example, Bijleveld 
et al. (1998), Finkel (1995), Hardin and Hilbe 
(2003), Hsiao (2003), Kessler and Greenberg 
(1981), Maruyama (1998), Sayrs (1989), and 
Wooldredge (2002), plus the chapters in this 
volume by Worrall (Chapter 15), Hilbe and 
Hardin (Chapter 28), Greenberg (Chapter 17), 
and Finkel (Chapter 29). Less extensively cov- 
ered, but not entirely absent from this liter- 
ature, is the use of panel analysis when the 
dependent variable is a categorical/qualitative 
dichotomous, nominal, or ordinal variable. Spe- 
cial problems arise in the use of categorical 
dependent variables in panel analysis, includ- 
ing issues of how to measure change and how 
to model the within-cases dependency in mul- 
tiwave panel data. One approach to modeling 
categorical panel data is to use logistic regres- 
sion analysis (Hosmer and Lemeshow, 2000; 
Menard, 2002a). The logistic regression family 
of models is readily adaptable to both dichoto- 
mous and polytomous (nominal or ordinal, and 
with three or more categories) data, allows 
the use of standardized coefficients (Menard, 
2004a) to compare the magnitude of effects of 


variables which do not have a natural metric, 
and is readily accessible in general-purpose sta- 
tistical software packages such as SAS, SPSS, 
and Stata. 

This chapter provides an applied approach 
to the use of logistic regression analysis to 
analyze categorical panel data, beginning with 
the issue of measuring change, then consider- 
ing simple two-wave panel models, and ending 
with a consideration of the use of logistic regres- 
sion analysis for multiwave panel data. The 
data used for the examples are taken from the 
National Youth Survey, a multiwave longitu- 
dinal study of a self-weighting national house- 
hold probability sample of 1725 individuals 
who were 11-17 years old when first inter- 
viewed in 1977, and who were last interviewed 
in 2002. The dependent variable is marijuana 
use, or more specifically change in the preva- 
lence (yes or no) of marijuana use. The pre- 
dictors are (1) exposure to delinquent friends, 
a scale indicating how many of one’s friends 
have engaged in nine different types of delin- 
quent behavior ranging from assault to theft 
to illicit and underage substance use and drug 
sales, plus whether they have encouraged the 
respondent to do anything against the law; (2) 
belief that it is wrong to violate the law, a scale 
indicating how wrong the respondent thinks it 
is to engage in any of nine types of behavior 
(the same as the first nine items in the expo- 
sure to delinquency scale); (3) age, measured as 
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years since birth; (4) gender, coded 0 = female, 
1 = male; and (5) ethnicity, with white non- 
Hispanic respondents as the reference category 
and two other categories, African-American and 
Other. Theoretically, marijuana use should be 
positively associated with exposure to delin- 
quent friends and negatively associated with 
belief that it is wrong to violate the law. Age, 
ethnicity, and gender are included as demo- 
graphic controls, but associations of age, gen- 
der, and ethnicity with marijuana use have been 
found in past research. For a more detailed 
description of the sample, the variables, and the 
theoretical basis for the models tested here, see 
Elliott et al. (1989). 

Let us begin by assuming that we have a 
dependent or outcome variable of interest, Y, 
with independent variables or predictors X, 
where k=1,..., K, and Y and the X, are mea- 
sured at time periods t=1,..., T. (The discus- 
sion here does not depend on whether we are 
concerned with a model that is merely predic- 
tive, or whether we want to make causal infer- 
ences). As described in more detail in Menard 
(2002b), there are four “pure” types of causal 
or predictive models, here presented in bivari- 
ate form involving one predictor X and one 
outcome Y: (A) X—~Y, where the value of 
the dependent variable is expressed as a func- 
tion of the value of the independent variable; 
(B) AX—~Y, where AX represents a change in 
X, and the value of the dependent variable is 
expressed as a function of the change in the 
independent variable; (C) X—>AY, where AY 
represents a change in Y and the change in 
the dependent variable is expressed as a func- 
tion of the value of the independent variable; 
and (D) AX—~+AY, where the change in the 
dependent variable is expressed as a function 
of the change in the independent variable. 
Models in which the independent variables 
include both level and rate-of-change vari- 
ables (e.g., population density and population 
growth rate as influences on economic devel- 
opment) are also possible. Model (A) requires 


only cross-sectional data (possibly time-ordered 
cross-sectional data; see Menard, 2002a, or the 
brief discussion in Chapter 1 in this volume); 
the other three models require measurement of 
at least one of the variables in the model for at 
least two different times, and models (C) and 
(D) specifically require repeated measurement 
of Y. In the simplest instance for longitudinal 
data analysis, T= 2 and we have a two-wave 
panel model, the case on which we will focus 
in this chapter. 


1 Measuring change in categorical 
dependent variables 


Implicit in the notation of AY in the forgoing 
discussion is the assumption that we know how 
to measure continuity and change in the depen- 
dent variable. For the interval and ratio vari- 
ables that are used as the dependent variables 
in linear regression, the options for measuring 
change are straightforward, involving subtrac- 
tion of the earlier value of Y from the later 
value of Y, where Y is measured on an interval 
or ratio scale. Measuring change for qualita- 
tive variables requires more careful consider- 
ation of what types of change (or continuity) 
are of interest. These possibilities can be rep- 
resented in a contingency table, as illustrated 
in Figure 30.1. Alternatively, using the per- 
centages for the rows or columns representing 
the earlier measurement of Y, we can represent 
the possible patterns of continuity or change 
in a transition matrix which indicates the per- 
centage of cases in each initial category of Y 
that either remain in that category or switch 
categories. Figure 30.1 illustrates contingency 
tables and transition matrices for hypothetical 
dichotomous and polytomous variables. 

Let us assume that we have a dichotomous 
dependent variable for which 0 = failure and 
1 =success. Part A of Figure 30.1 presents a 
hypothetical contingency table in which we 
have 500 successes and 500 failures at time 1, 
and 400 successes and 600 failures at time 2. 
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(A) Contingency table, dichotomous variable 


Frequencies Time 2 Failure Time 2 Success 
Time 1 Success 200 300 
Time 1 Failure 400 100 
(B) Transition matrix, dichotomous variable 
Row percentages Time 2 Failure Time 2 Success 
Time 1 Success -4000 .6000 
Time 1 Failure -8000 .2000 
(C) Contingency table, polytomous variable 
Frequencies 1 2 3 4 
1 200 100 75 25 
2 100 175 75 50 
3 50 75 150 125 
4 25 50 125 200 
(D) Transition matrix, polytomous variable 
Row percentages 1 2 3 4 
1 .5000 .2500 1875 .0625 
2 .2500 4375 1875 1250 
3 .1250 1875 3750 3125 
4 .0625 .1250 3125 5000 


Figure 30.1 Contingency tables and transition matrices 


Part B of Figure 30.1 presents the correspond- 
ing transition matrix, indicating that 60% of the 
time 1 successes and 20% of the time 1 fail- 
ures were successes at time 2. Now the question 
arises whether we are interested in each of the 
four cells of parts A and B of Figure 30.1 sep- 
arately, or whether we are willing to combine 
some of the cells, in operationalizing AY. 

(a) We can consider each cell separately: 
(1) continuity of failure in the lower-left cell, (2) 
continuity of success in the upper-right cell, (3) 
change from success to failure in the upper-left 
cell, (4) change from failure to success in the 
lower-right cell. This gives us a four-category 
polytomous nominal dependent variable AY, 
which can be analyzed using polytomous nom- 
inal logistic regression. 

(b) We can decide that we are interested in 
the differences in types of changes, and in the 
difference between continuity and each of the 


different types of change, but not in the differ- 
ence between continued success and continued 
failure. This can be accomplished by subtract- 
ing Y at time 1, abbreviated Y,,, from Y at 
time 2, abbreviated Y,,, to obtain a polyto- 
mous ordinal dependent variable AY coded 0 
for continuity, —1 for a change from success to 
failure, and +1 for a change from failure to suc- 
cess, which can be analyzed using polytomous 
ordinal logistic regression. 

(c) We can decide that we are interested only 
in whether a change occurs, and not in whether 
the change is from success to failure or from 
failure to success. This gives us a dichotomous 
dependent variable AY, which can be coded 0 
for no change and 1 for change. This model can 
be analyzed using dichotomous logistic regres- 
sion. 

(d) We can decide that we are interested only 
in whether one particular change, for example 
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the change from failure to success, occurred. 
This also results in a dichotomous dependent 
variable AY, coded 0 if the change did not occur 
and 1 if the change did occur. Two alternatives 
within this coding scheme are (1) to include all 
cases, in which instance cases in all except the 
bottom-right cell (time 1 failure, time 2 success) 
would be coded zero, and cases in the bottom- 
right cell would be coded 1; or (2) exclude those 
cases in the first row, because they could not 
possibly change from failure to success (they 
are all classified as successes in time 1), and 
consider only the second row, coding cases 
as O for failure and 1 for success at time 2. 
This approach parallels event history analysis 
insofar as it defines a set of cases “at risk” of 
change in the direction of interest and excludes 
all other cases from the analysis. This second 
option, however, effectively changes the depen- 
dent variable from AY to simply Y,,. In either 
case, the dependent variable can be analyzed 
using dichotomous logistic regression. 

With a polytomous dependent variable, the 
options are similar, but with a potentially larger 
number of categories in the dependent vari- 
able AY. 

(a) We can consider each different possibil- 
ity for continuity and change separately. If the 
number of categories in Y is c, then we have 
c” cells in the contingency table and hence c’ 
possible values for AY. 

(b) We can ignore differences among continu- 
ities and consider only differences among the 
possible changes, resulting in c(c—1) possible 
values for AY. 

(c) For ordered but not nominal polytomous 
dependent variables, we can consider only 
whether there is no change, coded 0, a change 
from a higher to a lower category, coded —1, ora 
change from a lower to a higher category, coded 
+1, producing a polytomous ordinal dependent 
variable AY (the same as option b for a dichoto- 
mous dependent variable). 

(d) We can consider only whether a change 
occurred or not, regardless of which specific 


change occurred, resulting in a dichotomous 
dependent variable AY which might be coded 
O for no change and 1 for change (the same as 
option c for a dichotomous dependent variable). 

(e) For ordered but not nominal polytomous 
dependent variables, we can subtract Y,, from 
Y,,, treating the differences in ranks the same 
regardless of the initial rank, for example treat- 
ing movement from category 1 to category 3 as 
being equal to the movement from category 2 
to category 4. This effectively assumes an inter- 
val rather than an ordinal scale, and raises the 
question of whether polytomous ordinal logis- 
tic regression or some other technique such as 
ordinary least squares (OLS) linear regression 
is most applicable. 

What is different about measuring change 
for categorical variables is that the measure 
of change is not defined in terms of some 
mathematical operation or sequence of opera- 
tions; instead, it is a process of deciding which 
changes are really different from which other 
changes, and really coding rather than cal- 
culating the dependent variable to produce a 
measure consistent with our substantive con- 
cerns. Even more than in the case of quantita- 
tive dependent variables, the question raised by 
Cronbach and Furby (1970) becomes pertinent: 
how should we measure change—or should we? 


2 Logistic regression for conditional 
and unconditional change in 
two-wave panel models 


Models involving AY as the dependent variable 
are called unconditional change models (e.g., 
Finkel, 1995). The conditional change model 
has Y,, instead of AY as a dependent variable, 
but includes Y,, as a predictor in the model. 
Assuming for the moment that none of the 
predictors is measured as a change score AX, 
the unconditional change model may be writ- 
ten AY = (Y,,—Yy) =a48,X,+8,X%,+... 4+ 
6, X,. The conditional change model may be 
written Y,.=a+8,X,+6,.X,+... +P, X,+7¥u, 
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or alternatively (Y,,—yY,,) =a+8,X,+fB,X,+ 
... +B,X,. Compared to the logistic regres- 
sion model for cross-sectional data, the con- 
ditional change model just adds one more 
predictor, the lagged dependent variable Y,,. 
Comparing the equations for the unconditional 
and conditional change models, the uncondi- 
tional change model effectively assumes that 
yy =1, regardless of the other predictors in the 
model (hence unconditionally), while in the 
conditional change model the value of y is 
conditional on (controls for) the other predic- 
tors in the model. Also, as described above, 
the unconditional change model for categori- 
cal dependent variables raises the issue of how 
to measure change in the dependent variable. 
In the conditional change model, how to mea- 
sure AY need not be an issue. Instead, the usual 
procedures for estimating dichotomous, nomi- 
nal polytomous, or ordinal polytomous logistic 
regression models may be used, the only change 
being that the lagged value of the dependent 
variable (Y,,) is included in the model. (As dis- 
cussed below, the measurement of Y,, and Y,, 
may not be identical for a polytomous nominal 
dependent variable, but this is at worst a minor 
issue.) Finally, both the unconditional and the 
conditional change models can be extended to 
include more than two time periods. 

The coefficient of the lagged endogenous vari- 
able, y, is sometimes called the “stability coef- 
ficient”. There are several interpretations of 
the stability coefficient which are statistically 
indistinguishable; which interpretation is most 
appropriate must be decided based on concep- 
tual or theoretical considerations (Davies, 1994; 
Finkel, 1995; Kessler and Greenberg, 1981; 
Rogosa, 1995). Most commonly, it is interpreted 
either as a control for prior, unmeasured influ- 
ences on Y, or as the inertial effect of past values 
of Y on the present value of Y. Alternatively, 
it may be interpreted as doing several things at 
once. Davies (1994) indicates that the stability 
coefficient may represent the impact of a pre- 
vious state or behavior on a present state or 


behavior, plus prior impacts of measured vari- 
ables, plus effects of unmeasured variables, on 
the dependent variable Y,. Because the stabil- 
ity coefficient may be incorporating more than 
a single type of effect, the conditional change 
model typically provides a liberal estimate of 
inertial effects, and a conservative estimate of 
the effects of other predictors in the model. In 
this respect, as Davies (1994, pp. 36-37) notes, 
the conditional change model is far from per- 
fect; but in light of the observation by the same 
author (Davies, 1994, p. 32) that the impact of 
interventions is often less than that predicted by 
statistical models, this characteristic may actu- 
ally be a desirable feature of the conditional 
change model. 

In both the conditional and unconditional 
change models, the predictors may or may not 
be cast as change scores, either X or AX. If, in the 
unconditional change model, all of the predic- 
tors are also change scores, AX, so change scores 
are used for the predictors as well as the depen- 
dent variable, we have a first difference model, 
model D above. The first difference model is so 
named because it takes the difference between 
adjacent measurement periods (the first differ- 
ence, in time series terminology), but not dif- 
ferences between first differences (the second 
difference) or higher ordered differences. In first 
difference and higher order difference mod- 
els, in addition to the usual issues involved in 
change models, predictors that do not change 
from one period to the next (stable individual 
characteristics) are eliminated from the model, 
potentially, as noted by King (1989) in the con- 
text of time series models, leaving more vari- 
ation attributable to error. The first difference 
form of the unconditional change model may 
be estimated by either ordinary dichotomous 
logistic regression, or (as detailed below), if we 
are interested in adjusting for individual differ- 
ences in patterns of change, conditional logistic 
regression. (The paradox in terminology, using 
conditional logistic regression to estimate an 
unconditional change model, simply reflects 
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the different things being conditioned in the 
change model and the estimation technique.) 

A consideration in both the conditional and 
unconditional change models is when to mea- 
sure the predictors relative to the dependent 
variable. Measuring the predictors and the 
dependent variable for the same time intro- 
duces ambiguity in the time ordering of cause 
and effect, and may produce overestimates of 
the impact of the predictor on the dependent 
variable when there are spurious time-specific 
effects. Measuring the predictors for a time prior 
to the time for which the dependent variable 
is measured helps eliminate this ambiguity in 
temporal ordering (but does not remove it com- 
pletely; see, e.g., Menard, 2002b, pp. 18-21), but 
raises the possibility that the time lag between 
cause and effect may be too long, resulting in an 
underestimation of the impact of the predictor 
on the dependent variable. 

In the context of linear regression models 
with interval or ratio-scaled dependent vari- 
ables, there has been considerable disagreement 
in the social and behavioral sciences about 
the relative appropriateness of unconditional as 
opposed to conditional change models when 
the purpose is to analyze change in panels with 
a small number of periods. This debate is rele- 
vant primarily to (a) the analysis of short-term 
change within individuals, (b) when the depen- 
dent variable is measured as AY = Y,, — Y,, on 
an interval or ratio scale. Arguments against the 
use of the unconditional change model are that 
change scores AY are systematically related to 
any random error of measurement, that they are 
typically less reliable than the raw scores of 
the variables from which they are calculated, 
and that the unreliability of change scores may 
lead to fallacious conclusions or false infer- 
ences (Cronbach and Furby, 1970). Arguments 
in favor of the use of change scores, usually 
cast in the context of the unconditional change 
model, are based on the assumption that we are 
interested in explaining intraindividual change 
rather than in causal analysis of differences 


among individuals; that the number of periods 
is small (typically no more than three); and 
that certain other assumptions are met. Liker 
et al. (1985) demonstrated that the uncondi- 
tional change model may be superior to both 
cross-sectional equations and the conditional 
change model when (a) regression parameters 
remain constant from one period to another, 
(b) there are unmeasured variables that influ- 
ence the dependent variable but do not change 
over time, (c) there is autocorrelated error in 
the measurement of those variables which both 
influence the dependent variable and vary over 
time, and (d) the panel data give more reliable 
measurement of changes in predictor variables 
over time than of the level or value of predictor 
variables at any given time, as may be the case if 
interindividual differences in change are large 
relative to interindividual differences in initial 
scores. 

The conditions under which the uncondi- 
tional change model is preferable to the lagged 
endogenous variable model are quite restric- 
tive, unlikely to be met in most observa- 
tional research, and difficult to meet even in 
experimental or quasi-experimental research 
(Bijleveld et al., 1998; Cronbach and Furby, 
1970; Finkel, 1995). In addition, the conditional 
change model may be more appropriate when 
there is a true causal effect of Y,, on Y,.. As 
Davies (1994, p. 33) explains, “positive tempo- 
ral dependence, or inertia, is to be expected 
of most social behaviour.” Finkel (1995, p. 7) 
notes that the prior value of Y may influence 
the current value of Y, and the influence of 
Yi, on Y,, may be misspecified by an uncon- 
ditional change model. On balance, it appears 
that the conditional change model is more 
generally valid than the unconditional change 
model. In addition, in the present context of 
qualitative dependent variables, it is easier to 
decide what model to estimate for the condi- 
tional change model. As noted earlier, there 
are several possibilities for operationalizing the 
measurement of AY for a categorical variable 
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in the unconditional change model, each of 
which requires a different method of calcula- 
tion (or coding) and different interpretation of 
the results. In the conditional change model, 
however, the dependent variable as modeled 
can often have the same coding as the depen- 
dent variable as originally coded, the approach 
to estimating and interpreting the model is sim- 
ply the usual procedure for dichotomous or 
polytomous logistic regression analysis, and the 
model uses at most c-1 (where c is the num- 
ber of categories in the dependent variable) sets 
of coefficients for predicting assignment to the 
different categories of a polytomous dependent 
variable (and possibly only one set of coeffi- 
cients for the predictors in either dichotomous 
logistic regression or an equal slopes/unequal 
intercepts ordinal logistic regression model for 
a polytomous ordinal dependent variable). 


3 The subject-specific model for a 
two-wave panel 


In both the conditional and unconditional 
change models, each case (at time 1) serves as 
its own control (for time 2), but in the uncon- 
ditional change model, the strength of the rela- 
tionship between the scores at time 1 and time 
2 is not considered to be a parameter we are 
interested in estimating. An implicit assump- 
tion in both of these models is that the change 
process is the same across individuals. In par- 
ticular, it is assumed that the impact of the 
independent variables on the outcome, as mea- 
sured by the 8 parameters, is the same for each 
case, and that each case has the same marginal 
probability, as measured by a. As described by 
Agresti (2002), this model is called the marginal 
model, because it focuses on the marginal dis- 
tributions of responses for the observations, and 
the effects in the marginal model are termed 
population averaged effects, because the effects 
are all averaged over the entire population or 
sample, rather than being measured separately 
for each case. 


An alternative assumption is that the process 
of change is unique for each individual, and 
that we must somehow incorporate this unique- 
ness into the analysis. In this conceptualiza- 
tion, each case is represented by (at least) two 
observations. The extreme situation would be 
one in which each parameter varied by indi- 
vidual, producing the model logit(Y) = a; + 
By X, +BiyX. +... +Bx:Xx, where the subscript 
i=1, 2,..., n refers to the specific case, and 
the first subscript on the 8 coefficients cor- 
responds to the subscript for the K predic- 
tors X,,X.,...,Xx. With only two observations 
per case, one cannot realistically estimate this 
model. A simpler model assumes that the effect 
parameters are constant across cases, but that 
the intercepts vary by subject, producing the 
model logit(Y) = a,;+8,X, +B,.X, +... +ByXx, 
which differs from the marginal model only 
in the subscript on the intercept, indicating a 
different intercept for each case. According to 
Agresti (2002, pp. 414-415), this model may 
be described as a conditional model, “since 
the effect 8 is defined conditional on the sub- 
ject.... The effect is subject-specific, since it is 
defined at the subject level.” Actually, Agresti 
was referring to a model with only one predic- 
tor, and it is only the intercept that is subject- 
specific. With more observations per case, it is 
possible to construct models in which the B 
parameters as well as the intercept are subject- 
specific. This model essentially differs from the 
marginal model by allowing each case to have 
its own probability distribution, as represented 
by a;. In this model, if the a coefficients are 
large relative to the B coefficients, the (shared) 
a; within each case may determine the outcome, 
resulting in the same value of Y for both time 
periods. Alternatively, if a; is small relative to 
the 8 parameters, it is possible to have different 
outcomes at the two time periods or for the two 
matched subjects. In this sense, the outcome for 
one of the paired observations is not necessarily 
independent of the outcome on the other, and 


512 Handbook of LongitudihdP Rega’ https:/jafrilibrary.com 


this dependence needs to be taken into account 
unless all of the a, are equal. 

The large number of parameters a, is prob- 
lematic for fitting the model using maximum 
likelihood, whose fit depends on the number 
of cases being large relative to the number of 
parameters. In the marginal model, the num- 
ber of cases can increase indefinitely while 
the number of parameters remains small, thus 
resulting in the large sample properties that 
make maximum likelihood estimation advan- 
tageous. When, instead, with the addition of 
every new case a new parameter is added to 
the model as well, the maximum likelihood 
estimates will be inconsistent (Andersen, 1970; 
Chamberlain, 1980; Neyman and Scott, 1948). 
The problem can be resolved if the a, are treated 
as “nuisance parameters” in which we have 
no direct interest. This is done in conditional 
logistic regression. Instead of using a likelihood 
function that explicitly includes the a,, condi- 
tional logistic regression “conditions” the like- 
lihood function on sufficient statistics for the a;. 

A sufficient statistic summarizes the infor- 
mation about a particular population parameter 
(such as the a,) such that if some function of 
the outcome Y depends on the distribution of 
the parameter, the sufficient statistic incorpo- 
rates sufficient information about that param- 
eter that the conditional distribution of the 
sample depends on the sufficient statistic and 
not, in addition to the sufficient statistic, the 
parameter itself. In conditional logistic regres- 
sion, the sufficient statistics for the a, are their 
pairwise totals of “successes”. If we set Y;, =0 
for “failure” and Y;, =1 for “success” at time 
1 for case i, and Y;, =0 for “failure” and Y;, = 
1 for “success” at time 2 for the same case i, 
then the sum S, = Y;, + Y;, is the pairwise total 
of successes for Y;. When the observations are 
identical on Y, the sufficient statistic S, is equal 
to either 2 (both successes) or 0 (both failures); 
when S; = 1, the outcomes differ. As described 
by Agresti (2002, p. 416), the distribution of 
(Y;,, Y;.) depends on B only when the outcomes 


differ for the two responses, and the condi- 
tional distribution is equal to exp(B, X, +B,.X, + 
+++ $BKXx)]/[1 +exp(B.X, +B.X. +... +BKXx)] 
when Y;, = 0 and Y,, = 1, and equal to 1/[1+ 
exp(B,X,+f8,X,+... +B,.X,)] when Y;, =1 and 
Yj. =0, where “exp(Y)” represents the exponen- 
tial function e’. 


4 Estimation of the conditional 
logistic regression model 


Leti=1, 2,...,n denote the cases. Within each 
case, lett=1, 2,..., T; denote the observations. 
Let the dependent variable Y take on observed 
values y;,, Yio.---»Yir, Where each y;, is equal 
to either zero or one, for the T observations 
within each case. Let there be K predictors, 
X,,X,...,X_. Let h,; be the number of cases for 
which y;, = 1, or equivalently the sum (over t) 
of the observed values 5,y;,. In the most general 
instance, the number of cases for which y;, = 1 
is h,; and the number of cases for which y,;, = 0 
is h,; = T;—h,; within each case. Using boldface 
type here to designate vectors, let x; represent 
the vector of values of the predictors X,, X,,..., 
Xx, let B represent the vector of K coefficients 
for the predictors, B,, B.,..., B,, and let exp(Y) 
represent the exponential function e”. Let S, be 
the set of all possible combinations of h,; ones 
and h,; zeros in case i, and let d, be an ele- 
ment of S;. The individual components of each 
(vector) element of S; are designated d;,, and 
correspond to the possible values for each cor- 
responding y;, for different possible vectors y;. 

Given these definitions, the proba- 
bility of a possible set of values of 
Y for an individual, the vector y; = 
{Vir Yias+++»Yir}, is equal to Plyi|2:yi = hii] = 
exp(2:VirXiB)/[2aexp(2:dixeB)]. That is, the 
probability of a particular set of observed 
values for Y for a particular case is equal to 
the exponentiated sum (over all observations 
in the case) of the products involving the 
observed values of Y, the observed values 
of the predictors, and the coefficients of the 
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predictors, divided by the exponentiated sum 
(over all possible combinations of values for Y 
in the case) of products of the possible values 
of Y (d,,), the observed values of the predictors, 
and the coefficients of the predictors. The 
corresponding conditional log-likelihood is 
L = 3i{2 (vx) — In[2aexp(2:dixiB)]}. This 
conditional probability does not involve the 
a;, so the a, are not estimated when the 
conditional likelihood is used. 


5 The fixed effects model using 
conditional logistic regression 


In the conditional logistic regression model, 
as described above, logit(AY) = Aa; +8,AX,;+ 
B,.AX,;+... +BxAX,,, where i indexes the cases 
within which the observations are clustered. 
The unit-specific intercepts are assumed to be 
constant within the cases, so Aa; = 0 and drops 
out of the equation. Estimating this model using 
full maximum likelihood results in inconsistent 
estimates for the a; and the 8, (Andersen, 1970; 
Chamberlain, 1980). Although the conditional 
logistic regression model most closely paral- 
lels model D at the beginning of this chapter 
(the unconditional change model with change 
scores as predictors, AX—> AY), it is different 
from the model obtained using unconditional 
logistic regression in several respects: (1) cases 
for which Y is a constant (and hence AY =0) 
are dropped from the analysis, thus potentially 
excluding cases for which the predictors have 
consistently high or consistently low values 
within the case over time; (2) predictors which 
were constant within cases were also dropped 
from the model, thus potentially excluding pre- 
dictors which might, although constant, affect 
the probability of change; (3) it includes the 
assumption of fixed within-case effects, a,, and 
if the within-case effect is not fixed, then the 
model is misspecified; and (4) the model is 
actually “retrodicting” the independent vari- 
ables from the dependent variable. Regarding 
this last point, it is actually the y; that are fixed 


Table 30.1 Conditional logistic regression fixed 
effects model 


Dependent R,7/Ro* Independent b* b p(b) 
variable variables (Wald) 


Change in 0.016 
marijuana 0.133 Belief 
use 


Exposure —0.128 —0.029 0.403 
—0.421 —0.099 0.027 


(only those cases for which AY=1 are included 
in the analysis), and it is the values of the x,, 
that are allowed to vary. All of these differences 
may lead to results which may be different from 
results produced using other methods of esti- 
mation for longitudinal models. 

In Table 30.1, a conditional logistic regres- 
sion model is presented for the unconditional 
change in marijuana use from time 1 to time 
2 using data from the National Youth Sur- 
vey (Elliott et al., 1989) for the years 1979 
and 1980. The theoretical predictors for change 
in marijuana use are exposure to delinquent 
friends, belief that it is wrong to violate the 
law, age, gender, and ethnicity (White, African- 
American, Other) as predictors. In principle, 
conditional logistic regression is not limited in 
the number of time periods that can be incorpo- 
rated into the model, but use of a large number 
of time periods makes it important to consider 
nonlinear relationships between the dependent 
variable and the time dimension, if the time 
dimension (age or period) is included in the 
model. 

The first column in Table 30.1 lists the 
dependent variable. The second column lists 
two measures of explained variation, the likeli- 
hood ratio coefficient of determination R,,” and 
the squared correlation between the observed 
(0 or 1) and predicted (continuous probability 
between zero and one) values of the depen- 
dent variable, Ry”. Arguments have been made 
for the latter measure in the context of logis- 
tic regression analysis, but evidence to date 
appears to indicate that for logistic regression 
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analysis using maximum likelihood estima- 
tion, R,” is the more appropriate measure of 
explained variation, although the two typi- 
cally produce similar results (Menard, 2000). 
The third column lists the predictors, and the 
fourth column lists the fully standardized logis- 
tic regression coefficients, which are directly 
analogous and behave similarly to the stan- 
dardized coefficients used in OLS regression 
(Menard, 2004a). The fifth column lists the 
unstandardized logistic regression coefficients, 
and the last column lists their statistical signif- 
icance based on the Wald test. For a general 
treatment of standardized and unstandardized 
coefficients in logistic regression analysis, see 
Menard (2002a). 

In Table 30.1, only 462 observations were 
used in calculating the conditional logistic 
regression model. At 2 observations per case, 
this means only 231 of the 1725 cases had 
valid data indicating change, with most (1257) 
having had valid data indicating no change 
in the dependent variable. As noted earlier, 
cases for which there is no change are dropped 
from the analysis, and this example illustrates 
the loss of data that can occur when using 
conditional logistic regression to estimate the 
unconditional change model accounting for 
interindividual heterogeneity (the individual- 
specific intercepts a;). In addition, the only 
predictors in the first row of Table 30.1 are 
Exposure and Belief; gender and ethnicity are 
constants within cases, and so because they do 
not vary over time they are necessarily dropped 
from the model, and the difference in age is a 
constant (everyone is one year older in the later 
wave than in the earlier wave), so age, too, is 
excluded from the model. 

The influence of exposure is unexpectedly in 
the negative direction, but not statistically sig- 
nificant; the influence of belief is, as expected, 
negative and statistically significant, with a 
standardized coefficient of —.421. R,” = .02, 
only marginally significant at p=.0751, but 
Ro’ = .13, suggesting a slightly stronger rela- 


tionship (and a larger discrepancy than one 
would normally expect in the conclusions to 
be derived from the two measures of explained 
variation). These results are distinctly at odds 
with other findings (e.g., Elliott et al., 1989) that 
exposure is the strongest predictor of marijuana 
use. Why the difference? Part of the explanation 
lies in the fact that individuals with very high 
levels of exposure who use marijuana at both 
times, and individuals with very low levels of 
exposure who use marijuana at neither time, 
have been eliminated from the analysis because 
the dependent variable is a constant. These 
results, then, suggest caution when applying 
and interpreting the fixed effects model using 
conditional logistic regression to the analysis of 
longitudinal data. The exclusion of cases which 
do not change on the dependent variable is 
potentially highly problematic, as these may be 
the cases which contribute the most to the asso- 
ciation between the dependent variable and the 
predictors. 


6 Unconditional logistic regression 
for the unconditional change model 


Using unconditional as opposed to conditional 
logistic regression for the unconditional change 
model, four models were calculated for change 
in marijuana use as the dependent variable. 
In all four models, change in marijuana use 
is measured by subtracting marijuana use in 
1979 from marijuana use in 1980, resulting in 
an ordinal dependent variable coded —1 for 
change from use to nonuse, +1 for change from 
nonuse to use, and 0 for no change (both users 
and nonusers). In the first model, constructed 
to parallel as closely as possible the condi- 
tional logistic regression analysis above, change 
in exposure and change in belief are the pre- 
dictors, calculated as 1980 exposure or belief 
minus 1979 exposure or belief. Here, the choice 
is made to consider changes in the independent 
variables that occur contemporaneously with 
the change in the dependent variable. The alter- 
native, measuring change in the independent 
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variables strictly prior to change in the depen- 
dent variable (i.e., measuring change in the pre- 
dictors from 1978 to 1979), would result in a 
model in which the time ordering was clearly 
correct between predictor and outcome, but in 
which the lag time might be too long, result- 
ing in lower levels of explained variation and 
smaller standardized and unstandardized coef- 
ficients for the predictors. In the second model, 
age is added to the equation. In the third model, 
age is dropped and gender and ethnicity are 
added to the equation. The fourth and final 


analysis includes age, gender, ethnicity, change 
in exposure, and change in belief as predic- 
tors. Analysis is performed using the cumula- 
tive logit model, and the results are presented 
in Table 30.2. The threshold coefficients in 
Table 30.2 are analogous to intercepts in OLS 
regression and dichotomous logistic regression, 
and are of no substantive interest, but are pre- 
sented for completeness. Note that we have now 
moved from a fixed effects to a marginal model. 

In Table 30.2, each successive model explains 
a little more of the variation in the dependent 


Table 30.2 Unconditional change models for marijuana use 


Dependent R,?/Ro’ Independent variables b* b p(b) 
variable (Wald) 
Change in .026 Change in exposure 146 114 .000 
marijuana .028 Change in belief —.057 —.051 .030 
use Threshold 1 — —2.798 .000 
Threshold 2 — 2.306 .000 
.033 Change in exposure 142 113 .000 
.034 Change in belief —.054 —.048 .039 
Age —.082 —.082 .002 
Threshold 1 — —4.788 .000 
Threshold 2 — 361 .566 
.034 Change in exposure 155 .120 .000 
.036 Change in belief —.058 —.051 .029 
Gender (male) .005 .026 £855 
Ethnicity: 
African-American .087 .656 001 
Other .033 381 .214 
Threshold 1 _ —3.750 .000 
Threshold 2 — 1.401 .000 
.039 Change in exposure 151 119 .000 
.042 Change in belief —.054 —.049 .038 
Age —.080 —.113 .003 
Gender (male) .008 .045 755 
Ethnicity: 
African-American 083 .638 001 
Other .027 .319 .299 
Threshold 1 _ —5.600 .000 
Threshold 2 — —.406 565 
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variable than the previous model, although the 
overall level of explanation is low, accord- 
ing to both R,* and R,’. This appears to be 
because the cases are heavily clustered in the 
“no change” category of the dependent vari- 
able. For all of the models, the standard test for 
nonparallel slopes in ordinal logistic regression 
analysis was not statistically significant, indi- 
cating that the parallel slopes model fits well. 
Turning to the individual predictors, change 
in exposure is consistently statistically signifi- 
cant and the strongest predictor, based on the 
standardized coefficient. An increase in expo- 
sure is associated with higher change scores 
for marijuana use. When they are included in 
the model, age and African-American ethnic- 
ity are also statistically significant and nearly 
equal in magnitude. Increasing age is nega- 
tively associated, and African-American eth- 
nicity is positively associated with the change 
score for marijuana use. Further exploration of 
these relationships suggest that (a) marijuana 
use tends to be initiated at younger ages, and 
to decline at older ages, producing the pattern 
of results with age, and (b) African-American 
respondents tend to initiate marijuana use later 
than non-African-American respondents, with 
the result that they are more likely than non- 
African-American respondents to be increasing 
their marijuana use at the ages (14-21) covered 
in these waves of the NYS. Fourth in magnitude 
of effect is change in belief, with the expected 
negative relationship between stronger beliefs 
that it is wrong to violate the law and higher 
change scores for marijuana use. Gender and 
“Other” ethnicity are not statistically signifi- 
cant predictors of marijuana use. Males are no 
more likely than females, and “Other” ethnic 
groups no more likely than white non-Hispanic 
Europeans, to have higher change scores on 
marijuana use. 

A few comments on these results are in order. 
First, the relatively low levels of explained vari- 
ation when using change scores (for exposure 
and belief) as predictors of change scores are 


not uncommon in social science research. Sec- 
ond, the use of change scores as predictors 
often introduces one of two problems, exces- 
sively long lag between change in independent 
variables and change in independent variables, 
or else ambiguous temporal (and hence causal) 
ordering between changes in independent vari- 
ables and changes in the dependent variable. 
Third, even though African-American ethnicity 
itself does not change, it does have an impact on 
change in marijuana use, reinforcing the obser- 
vation that the exclusion of time-constant vari- 
ables from the conditional logistic regression 
model described in the previous section may 
be problematic. Fourth, aside from the influ- 
ence of African-American ethnicity on change 
in marijuana use and the sheer magnitude of the 
standardized coefficients and explained varia- 
tion, these results are fairly similar to cross- 
sectional models for marijuana use using the 
same predictors and prevalence of marijuana 
use as a dependent variable. Using raw scores 
instead of change scores for exposure and belief 
in the present example (results not shown in 
detail here) merely produces lower levels of 
explained variation than those obtained using 
change scores. The model predicting change in 
marijuana use from change, rather than level, 
of exposure and belief appears to be reason- 
ably well specified; it just does not explain a 
great deal of variation in the change in mari- 
juana use. Fifth, as noted earlier in this chapter, 
there are other codings that could be used for 
change in marijuana use; for all of them, the 
unconditional change model seems best suited 
to analysis of change from one time period 
to a subsequent period, that is, to the analy- 
sis of only two periods rather than multiple 
periods. With data from multiple periods, how- 
ever, one could use a multiwave panel analysis 
approach parallel to that described by Finkel 
in Chapter 29, with logistic regression for the 
unconditional change model, possibly incorpo- 
rating chronological or calendar time in addi- 
tion to or instead of age as a control variable. 
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This possibility will be considered further at 
the end of this chapter. 

Finally, despite their apparent similarities, 
the models in Table 30.1 and the first row of 
Table 30.2 are different in a very important way. 
The model in Table 30.1 assumes (correctly or 
incorrectly) that (1) there is unmeasured hetero- 
geneity in the respondents, (2) the form taken 
by this unobserved heterogeneity is adequately 
reflected in differences in intercepts, represent- 
ing unmeasured interindividual differences in 
the distribution of the probability of marijuana 
use, and not in the coefficients, which would 
represent unmeasured interindividual differ- 
ences in the effects of exposure and belief on 
marijuana use; and (3) the exclusion of cases 
with no change, and predictors which are either 
constant over time (gender and ethnicity) or 
whose changes are invariant across individuals 
(age) does not bias the results. In contrast, the 
models in Table 30.2 assume either that there 
is no significant unmeasured heterogeneity 
among the respondents that would be reflected 
in either intercept or slope parameters (equal 
slopes are also assumed in the model in 
Table 30.1), or that we are not concerned with 
such individual variation (e.g., because we 
are planning an intervention at the population 
rather than the individual level, and it is the 
average effects across individuals, not the 
individual-level effects that count). Even given 
this, the results from the conditional logistic 
regression fixed effects model render it suspect, 
given past research in this area. Better resolu- 
tion of this issue might be possible with addi- 
tional waves of data, allowing the modeling of 
individual trajectories (not just intercepts) and 
perhaps random slope coefficients; that pos- 
sibility is addressed very briefly at the end of 
this chapter, but see also Chapter 33 in this vol- 
ume (Menard) regarding the use of multilevel 
longitudinal analysis of categorical dependent 
variables. 


7 Logistic regression for the 
conditional change model 


For the conditional change model, four models 
were calculated, paralleling the models for the 
unconditional change model, and they are pre- 
sented in Table 30.3. The dependent variable is 
current marijuana use in wave 5 (1980) of the 
NYS. The predictors are exposure (not change 
in exposure), belief (not change in belief), age, 
gender, and ethnicity, plus prior marijuana use. 
In this analysis, exposure and belief are mea- 
sured at the same time as prior marijuana use 
(1979), temporally prior to the dependent vari- 
able, current (1980) marijuana use. The tempo- 
ral ordering of the presumed cause and effect is 
thus relatively unambiguous. It would be pos- 
sible to include change scores for exposure and 
belief from 1979 to 1980 in place of levels of 
exposure and belief in 1979; the use of the lev- 
els as opposed to the change scores is, how- 
ever, the more common practice in conditional 
change models, in part because these models 
can be interpreted not only as models of change 
in the dependent variable, but also as models of 
the level of the dependent variable that simply 
include the lagged dependent variable (in this 
instance prior marijuana use) as a predictor. 
The first column in Table 30.3 specifies the 
dependent variable, as in previous tables. The 
second column includes R,,”, AR,”, and ARj’, 
where AR,” is the change in R,” that occurs 
when the other predictors are added to a model 
that already includes prior marijuana use. In 
effect, AR,” is the likelihood ratio partial mul- 
tiple squared correlation: (a) likelihood ratio 
(using R,”), (b) partial (controlling for prior 
marijuana use), (c) multiple (since it involves 
two or more predictors), (d) squared correla- 
tion (really explained variation) between the 
set of predictors and current marijuana use, 
controlling for prior marijuana use. Comparing 
AR,” in Table 30.3 with R,” in Table 30.2, the 
two are of a similar order of magnitude, with 
AR,” being very slightly larger, a common result 
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Table 30.3 Conditional change models for marijuana use 


Dependent a Independent variables b* b p(b) 
variable AR, /Ro (Wald) 
Marijuana 401 Exposure 144 .088 .000 
use 042 Belief —.187 —.129 .000 
482 Prior marijuana use 402 2.270 .000 
Intercept - .960 .192 
402 Exposure 143 .087 .000 
.042 Belief —.194 —.133 .000 
482 Age —.025 —.036 368 
Prior marijuana use .489 2.749 .000 
Intercept - 1.657 121 
.404 Exposure .146 .088 .000 
.043 Belief —.199 —.136 .000 
482 Gender (male) —.027 —.151 327 
Ethnicity: 
African-American 045 345 .092 
Other .017 :195 .536 
Prior marijuana use .489 2.737 .000 
Intercept - 1.155 121 
404 Exposure 144 .087 .000 
.044 Belief —.205 —.140 .000 
482 Age —.026 —.037 358 
Gender (male) —.028 —.152 323 
Ethnicity: 
African-American 045 348 .089 
Other .015 .182 .567 
Prior marijuana use 495 2.766 .000 
Intercept - 1.870 .083 


in comparing unconditional and conditional 
change models. As a partial correlation, AR,” 
is a very conservative estimate of the impact of 
the predictors on the dependent variable. 
Turning to the individual predictors, the sub- 
stantive results are similar but not identical to 
the results for the unconditional change model. 
First, prior marijuana use is the strongest influ- 
ence in each of the models in Table 30.3, 
followed by belief (with standardized coeffi- 
cients b* = —.187 to —.205), then exposure (b* = 
.143 to .146), and all three of these predictors 
are statistically significant in all four models. 
Being African-American is the only other influ- 
ence to attain even marginal significance, with 


b* = .083 to .087, p = .089 to .092. In con- 
trast to the results for the unconditional change 
model, age does not appear to be a statistically 
significant influence on marijuana use. Note, 
however, that age was a very weak (b* < .100) 
influence in the unconditional change model, 
and would have been regarded as substantively 
nonsignificant, even though it was statistically 
significant. The coefficients for exposure are 
comparable for the conditional and uncondi- 
tional models, but the conditional model sug- 
gests that the influence of belief is substan- 
tively as well as statistically significant (and 
stronger than the influence of exposure), while 
the unconditional model indicates that the 
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influence of belief is substantively nonsignifi- 
cant and weaker than the influence not only of 
exposure but also of being African-American. 
Bear in mind that, in principle, these differ- 
ences in results could reflect not only the dif- 
ferences between dependent variables in the 
conditional and unconditional change model, 
but also differences in the predictors (levels as 
opposed to changes in exposure and belief). 
As indicated earlier, however, the results of 
the unconditional change model are similar 
regardless of whether levels or changes in 
belief and exposure are used as predictors, with 
slightly lower explained variation using lev- 
els as opposed to changes in belief and expo- 
sure in the unconditional change model, so 
it seems more likely that it is not the differ- 
ences in how the predictors are measured, but 
rather the differences between the operational- 
ization of change in the dependent variable, that 
makes the difference between the conditional 
and unconditional change models here. 


8 Extensions to polytomous 
dependent variables 


The conditional and unconditional change 
models are readily extended to the analysis of 
polytomous dependent variables. In the uncon- 
ditional change model for polytomous depen- 
dent variables, the question is how to code 
change in the dependent variable and in any 
of the time-varying categorical predictors, an 
issue already addressed above. In the condi- 
tional change model, the question is how to 
code the lagged endogenous variable. For a 
nominal polytomous dependent variable, the 
baseline logit model compares each category 
with a baseline category. This means that dupli- 
cating the coding of the dependent variable for 
the lagged dependent variable results in missing 
cases for each of the dummy lagged variables, 
and the model cannot be calculated. Instead, 
some other contrast will be necessary. One pos- 
sibility here is to use indicator coding (1 for 


the category in question, zero for all other cate- 
gories, not just for the reference category) for the 
predictors, but other contrasts (e.g., deviation 
or effect coding; see Menard, 2002a for a more 
extended discussion of contrasts for categori- 
cal predictors in logistic regression analysis) are 
also possible. For ordinal polytomous depen- 
dent variables, whether the coding of the lagged 
endogenous variable can be identical to the cod- 
ing of the dependent variable depends on which 
of the ordinal logistic regression models is spec- 
ified. In particular, for the cumulative logit 
model, which uses all of the cases for each of 
the contrasts, it is possible to code the dummy 
variables representing the lagged endogenous 
variable Y,_, identically to the dummy vari- 
ables representing the dependent variable Y,. 
Alternatively, one can treat the lagged endoge- 
nous variable Y,_, as either a set of unordered 
dummy variables with an indicator (or other) 
contrast, or one can treat Y,_, as an interval pre- 
dictor, but this latter strategy raises the issue of 
whether, if Y can be treated as an interval vari- 
able, logistic regression is the most appropriate 
approach to its analysis. 


9 Multiwave logistic regression 
panel models 


So far, we have been dealing only with two- 
wave panel models, models in which measure- 
ment of the dependent variable (and possibly 
the predictors) occurs at two distinct times. It is 
also possible to have a two-wave panel model 
in which more than one variable is treated as 
a dependent variable, at least with respect to 
some of the other variables in the model. For 
example, it is possible that marijuana use not 
only is influenced by, but also influences, expo- 
sure to delinquent friends and belief that it 
is wrong to violate the law. In the terminol- 
ogy of path analysis, belief, exposure, and mar- 
ijuana use may all be treated as endogenous 
variables with respect not only to gender and 
ethnicity, but also with respect to one another, 
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in a two-wave or multiwave panel model. As 
with any other path analytic model, we must 
be concerned with method of estimation and 
model identification. If we build all possible 
influences into the model, the model will be 
underidentified, and we will not be able to cal- 
culate the path coefficients for the model until 
we impose some constraints upon the model. 
One common constraint is to limit the influ- 
ence of an independent variable to the wave 
immediately following the wave for which the 
independent variable was measured, and to 
assume that any impact of the independent vari- 
able on subsequent waves occurs as a result 
of, or in a sense is filtered through, the vari- 
ables in the immediately following wave on 
which the independent variable has an effect. 
For example, if exposure has an effect on mar- 
ijuana use, it may be assumed that exposure 
at time 1 directly affects marijuana use at time 
2, but does not affect marijuana use at time 3, 
independent of its effect on exposure at time 2 
and marijuana use at time 2 (which in turn may 
affect marijuana use at time 3). Another com- 
monly imposed constraint for a small number of 
waves (e.g., 1-5) is to impose the constraint that 
the effect of one variable, for example belief, 
on another variable, for example exposure, is 
assumed to be constant regardless of which 
adjacent waves the variables are measured. 
Alternatively, one can allow for the possi- 
bility that the impact of an independent vari- 
able on a dependent variable varies over time 
or age, either by building interaction terms 
between time or age and other variables into 
the model, or if none of the coefficients is con- 
strained to be equal over time or age, by sep- 
arate estimation of each equation (as opposed, 
for example, to using a pooled TSCS approach). 
Separate estimation of multiple equations in a 
system of structural equations was more com- 
mon in the earlier days of path analysis, and 
has largely fallen out of favor with the devel- 
opment of simultaneous estimation techniques 
such as LISREL for structural equation models. 


This is because the use of full-information max- 
imum likelihood techniques allows more effi- 
cient use of valid theory, resulting in more 
precise parameter estimates; but even for struc- 
tural equation models for quantitative variables, 
separate estimation does have the advantage 
that errors in one part of the model are not 
propagated throughout the rest of the model 
(Heise, 1975). For logistic regression analysis, 
the absence of simultaneous equation estima- 
tion techniques parallel to those for quantita- 
tive dependent variables is an additional reason 
for considering a separate estimation approach 
for structural equation or path analysis models 
using logistic regression (Menard, 2004b). 

Conditional change models for dichotomous 
and ordinal variables can also be implemented 
in the structural equation modeling framework, 
using polychoric correlations and weighted least 
squares (J6reskog and Sdrbom, 1989) as an 
alternative to the logistic regression framework. 
Another consideration in two-wave or multi- 
wave panel models with multiple endogenous 
variables is that we do not have truly indepen- 
dent measures from one wave to the next, but 
instead have observations that are nested within 
individuals (and possibly within primary sam- 
pling units as well). Knowledge of this poten- 
tial source of dependency in the data suggests 
that the use of robust standard error estimates 
or generalized estimating equations (Hardin and 
Hilbe, 2003; see also Hilbe and Hardin, Chapter 
28, in this volume) should be considered. 

With five or more waves of data, other 
solutions become available. The time dimen- 
sion can be explicitly incorporated into the 
model, including nonlinear functions of time, 
in multilevel analysis using the general lin- 
ear model (e.g., Raudenbush and Bryk, 2002; 
see also Menard, Chapter 33, in this vol- 
ume). Another possibility is the use of dis- 
crete time event history analysis (e.g., Singer 
and Willett, 2003; Yamaguchi, 1991; see also 
Keiley et al., Chapter 27, in this volume) to 
model not only whether but also when an 
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outcome occurs. It remains possible, particu- 
larly if one is interested in modeling relation- 
ships that are conceptually reciprocal (exposure 
influences marijuana use and marijuana use 
influences exposure), to use separate estima- 
tion of recursive models for each time-specific 
dependent variable to explore more complex 
causal structures, parallel to those analyzed 
using OLS regression (for separate estimation) 
or maximum likelihood structural equation 
models (for simultaneous estimation). As noted 
above, separate estimation has largely fallen 
out of favor for models with continuous inter- 
val/ratio dependent variables, for which max- 
imum likelihood structural equation modeling 
is now the standard, but simultaneous equation 
techniques for estimation of multiwave panel 
models with categorical dependent variables 
are not well developed or readily available in 
existing statistical software, and separate rather 
than simultaneous estimation of equations may 
be the most practical solution at present. 


10 Conclusion 


The use of logistic regression in longitudinal 
panel analysis poses all of the problems asso- 
ciated with the use of ordinary least squares 
linear regression or related techniques in lon- 
gitudinal analysis, plus a few more. As with 
linear regression, there are questions of (1) 
whether to use a conditional or an uncon- 
ditional change model, and (2) whether to 
measure predictors as levels (the predictor mea- 
sured at a single time) or change scores. In addi- 
tion, for longitudinal logistic regression models, 
we must decide how to measure change in qual- 
itative dependent variables, as dichotomies, tri- 
chotomies, considering each possible change 
separately, and whether to consider each pos- 
sible type of continuity separately. For much 
the same reasons that the conditional change 
model is more generally applicable in linear 
panel analysis, and for other reasons as well, the 
conditional change model seems generally to be 


the best option for data involving a large num- 
ber of cases and relatively few time periods, 
either dichotomous or polytomous (nominal 
or ordinal) data with relatively few categories, 
and the use of logistic regression analysis. The 
conditional change model is also consistent 
with approaches taken in analyzing data with 
a larger number of periods, particularly mul- 
tilevel change models. This is not to say that 
one should avoid the use of other models for 
longitudinal analysis of data with a small num- 
ber of periods, only that one should probably 
begin with the conditional change model and 
consider whether the data, model assumptions, 
or other concerns provide sufficient reason 
for selecting a different model for longitudinal 
panel analysis using logistic regression. 


Software 


SAS, SPSS, Stata, and other general-purpose 
statistical software packages include routines 
for dichotomous and polytomous logistic 
regression analysis. None is completely satis- 
factory, and none generates all of the coeffi- 
cients used in the present chapter without some 
additional calculation by hand; see Menard 
(2000, 2002b, 2004a) for details on calculation 
of standardized logistic regression coefficients 
and R,” and R,” from SAS and SPSS output. 
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Chapter 31 i 


Latent growth curve models 
Michael Stoolmiller 


In this chapter, we take up the subject of mod- 
eling change over time by building models for 
growth curves using a structural equation mod- 
eling (SEM) approach. We start by defining 
growth curves and introducing the terminol- 
ogy for describing them. Next, we describe the 
current available alternatives for fitting growth 
curve models, their strengths, and limitations. 
Then we introduce the basic latent growth 
curve (LGC) model and walk through the fun- 
damental steps of specification, identification, 
and estimation using an empirical example 
drawn from developmental psychology. Model 
identification involves some algebraic manipu- 
lations and readers not interested in this level 
of detail can safely skip these sections. Next, we 
elaborate the basic LGC model to include pre- 
existing predictors of future growth and repeat 
the steps of specification, identification, and 
estimation. Finally, we extend the LGC model 
again to demonstrate how one can not only 
include predictors of growth but also growth as 
a predictor of a distal outcome. The empirical 
example used throughout illustrates not only a 
successful application of the methodology but 
also some of the problems that users are likely 
to encounter when they begin to fit LGC models. 

A growth curve describes how a dependent 
variable, say y, depends on the passage of 
time, where time is considered the independent 
variable. In other words, a growth curve is a 
function that takes time as input and returns 


y values. The function has some mathemati- 
cal form, for example linear, as illustrated in 
Figure 31.1, and then parameters that control 
the exact type of the chosen functional form. 
For example, the parameters for a straight line 
are intercept and slope, which define a particu- 
lar straight line and distinguish it from other 
straight lines with different values of intercept 
and slope. The intercept of a linear growth 
curve is defined as the value of the dependent 
variable, y, when time, the independent vari- 
able, is equal to zero, as shown in Figure 31.1 
where the line intersects the y axis. The y axis 
is drawn in this case at the point where time 
is equal to zero but this is not always true or 
necessary. The slope of a linear growth curve is 
defined as the amount of change in the depen- 
dent variable, y, for a unit increase in time, the 
independent variable as shown in Figure 31.1 
by the two black arrows. 

What is illustrated in Figure 31.1 is a mathe- 
matical abstraction, a straight line. In actual 
practice, it is not usually the case that the 
dependent variable is continuously observed as 
implied by the plot in Figure 31.1. More typi- 
cally, an empirical growth curve consists of 
a set of repeated measurements of the depen- 
dent variable taken at discrete time intervals. 
Real world data from the social sciences also 
rarely conform to deterministic mathematical 
abstractions such as straight lines. More likely, 
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Figure 31.2 Fitted growth curve (dashed line) for 
outcome y with observed data (black dots). Epsilons 
(ce) are time-specific influences 


a growth curve based on 4 repeated assessments 
would look like Figure 31.2. 

The 4 observed y values are shown by solid 
black circles. The dashed line is the hypotheti- 
cal straight line growth curve fit to the observed 
data. The discrepancies between the observed 
values and the values implied by the fitted 
straight line are shown as blue arrows point- 
ing away from the line to the observed data. 
These discrepancies, denoted as ©1 to e4 in 
Figure 31.2, are known as time-specific influ- 
ences or sometimes also as residual influences. 


The impact of the time-specific influences is 
to mask the underlying linear growth curve. 
These time-specific influences include random 
measurement error but they could also include 
predictable influences that operate only at a 
specific point in time. The observed y scores 
are equal to the fitted growth curve scores at 
each time point plus e at each time point and 
so they do not follow the nice linear progres- 
sion as indicated by the fitted line. Note that 
the y2 score is actually less than the y1 score 
in Figure 31.2. There is an obvious correspon- 
dence of the growth curve in Figure 31.2 to 
observed variable regression methodology.’ 

In growth curve modeling, it is typically the 
case that instead of just one growth curve, 
we have a sample of growth curves, one for 
each subject. To illustrate how this expands the 
modeling possibilities, let us assume we have 
3 individuals measured at 4 time points and 
we know their intercept, slope, and observed y 
values. Their data and fitted growth curves are 
shown below in Figure 31.3. Suppose indivi- 
dual 1 has an intercept score of 1 and a slope 
score of 1. This individual’s fitted growth curve 
scores over the 4 times are 1, 2, 3, and 4 and 
are shown below in the upper left corner of 
Figure 31.3. Subject 2 has an intercept score of 
4 and a slope score of —1 and their fitted val- 
ues are 4, 3, 2, and 1 and the curve is shown 
in upper right of Figure 31.3. Subject 3 has an 
intercept score of 2 and a slope score of 0 and 
their fitted values are 2, 2, 2, and 2 and the curve 
is shown in lower left of Figure 31.3. In addition, 
for each subject, their time-specific values, e1 to 
e4, are shown as arrows pointing away from the 


‘In fact, we could choose to fit the straight line 
growth curve by regressing the observed y values on 
the time values using ordinary least squares regres- 
sion. This approach has the appeal of simplicity and 
is useful for preliminary graphical analyses as we 
will demonstrate but it has the disadvantage of being 
less statistically efficient than fitting growth curves 
using maximum likelihood techniques. 
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Figure 31.3 Fitted growth curves for 3 individuals (solid lines) for outcome y. Epsilons are time-specific 


influences 


fitted growth curve. The mean of the intercept 
and slope scores for these 3 subjects would be (1+ 
4+2)/3=2.5and(1—1+0)/2 =0,respectively. 
The variance of the intercept and slope scores for 
these 3 subjects would be [(1 — 2.5)? + (4—2.5)?+ 
(2 — 2.5)?]/3 = 1.58 and [(1 — 0)? + (-1— 0)? + 
(0 —0)?]/3 = 2/3 respectively. The covariance of 
the intercept and slope for these 3 subjects is 
equal to —1. 

Now an obvious question to a developmental 
psychologist is what causes the individual dif- 
ferences in both initial status (intercept scores) 
and growth rate (slope scores) over time? Why 
did subject 1 go up? Why did subjects 2 and 3 
go down and stay the same, respectively? Did 
these subjects differ on some important back- 
ground variable at time zero or earlier that was 
responsible for the developmental differences 
in change over time? LGC models can help 
answer these kinds of questions. 

The SEM approach to growth curve models, 
however, is not the only available alternative. 
Here we digress briefly to compare the cur- 
rently available major alternative so that read- 
ers can make informed decisions about which 
approach best meets their needs. The major 
alternative for fitting growth curve models goes 
by a variety of terms, including the multi- 
level model (MLM; e.g., Goldstein, 2003), the 
random effect model (REM; e.g., Laird and 
Ware, 1982), the mixed effect model (MEM; e.g., 


Pinheiro and Bates, 2000) or the hierarchical 
linear model (HLM; e.g., Raudenbush and Bryk, 
2002) depending on author or software package. 

There are two important differences that 
readers may want to consider before embarking 
on a growth curve analysis. The first major 
difference between most alternatives and LGC 
is that LGC is capable of estimating compli- 
cated structural models between pre-existing 
predictors, growth, and future outcomes of 
growth, including testing for mediation or 
indirect effects that may flow from pre-existing 
predictors to growth and then to some distal 
outcome. In contrast, the alternatives treat 
growth as the ultimate and only outcome, 
making it difficult to extend the model to use 
the growth patterns themselves as predictors or 
examine indirect effects. The empirical exam- 
ple used in this chapter illustrates this aspect of 
LGC but for another example, see Stoolmiller, 
Duncan, Bank and Patterson (1992). 

On the other hand, the second major differ- 
ence is that LGC takes a multivariate approach 
to repeated measures within subjects, which 
means that the number of repeated measures 
and the balance of the repeated assessment 
design can be a concern, especially in small 
samples. One strategy to dealing with imbal- 
ance in the design (i.e., different time intervals 
for different individuals) for LGC analyses is 
to create more repeated assessments but set the 
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data to missing for individuals not observed at 
that specific interval. As the number of repeated 
assessments gets large, however, and begins to 
approach the number of individuals in the data, 
the estimation procedures can become unstable 
and ultimately break down. In this situation, 
multilevel models may be a better choice.” 

Returning now to the LGC, shown below in 
Figure 31.4 is a path diagram for a simple linear 
growth SEM with 4 observed variables that rep- 
resent repeated assessments of the same mea- 
sure at 4 time points. Notation follows closely 
that developed by Muthen and Muthen (2004) 
for their MPlus package, which was used to 
estimate all models in this chapter. The model 
assumes that the time intervals between assess- 
ments are the same for all subjects although not 
all subjects have to have all 4 assessments. Sub- 
jects with only 3, 2, or even just 1 assessment 
can still be included in the model.* 


2Mplus, the program used for all analyses in this 
chapter, is an exception to this rule and can handle 
highly imbalanced data in the same manner as the 
alternative programs. 

3With longitudinal data, it is often the case that a sub- 
stantial portion of the sample has only partial data 
and a model that required every subject to have all 4 
assessments would not be very useful. Most SEM pro- 
grams have options for using cases with partial data 
and this option should be the default choice. Using 
cases with partial data reduces potential biases, maxi- 
mizes statistical power and is almost always superior 
to older ad hoc approaches to dealing with miss- 
ing data, including using only subjects with com- 
plete data or subjects with some minimum number of 
assessments (e.g., 2 out of 4) (Schaffer and Graham, 
2002). In fact, so long as the fact that the data are 
missing does not depend on the value of the miss- 
ing data, estimation can proceed using cases with 
partial data without introducing bias relative to the 
target population from which the sample was drawn. 
This condition is usually referred to as ignorable miss- 
ingness. Keep in mind, however, that even if ignor- 


able missingness does not hold, using only subjects 
with complete data or some minimum number of 


assessments will usually result in even more bias. 


Figure 31.4 Path diagram for linear growth curve 
model 


The model has two latent variables, which 
are labeled Intercept and Slope. The intercept 
and slope factors represent the collection of 
intercepts and slopes for each individual sub- 
ject’s linear growth curve. The intercept and 
slope have means, a1 and «2, and variances, 
1 and 2, respectively and a covariance, 12. 
The quantities a1 and a2 correspond to the 
means of the intercept and slope scores, respec- 
tively, computed for the example in Figure 31.3 
above. The quantities 1 and 2 correspond to 
the variances of the intercept and slope scores, 
respectively, computed for Figure 31.3 and 12 
corresponds to the covariance. Associated with 
each observed y measure is a latent, time- 
specific variable, «. These correspond to the 
discrepancies between the fitted growth curve 
scores and the observed y values as illustrated 
in Figures 31.2 and 31.3. The factor loadings for 
the intercept are fixed at 1 and for the slope, 
the values represent the linear passage of time, 
0, 1, 2, and 3. Because of the definition of the 
intercept of a growth curve as being the value 
of the outcome when time is zero, the Inter- 
cept factor in Figure 31.4 represents individual 
differences in the outcome, corrected for time- 
specific influences, at y1, the first time data was 
collected. If a different set of factor loadings 
had been employed, the intercept factor would 
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represent true individual differences at a differ- 
ent time. Say, for example, the factor loadings 
of —2, —1, 0 and 1 were used. Then the intercept 
factor would represent true individual differ- 
ences at y3, the third data collection point. Or if 
the factor loadings of —3, —1, 1 and 3 were used, 
the intercept factor would represent individual 
differences midway between time points 2 and 
3, a time point for which no data was actually 
collected. This particular point is actually the 
mean of the 4 fitted values over time, which 
makes it an interesting choice for an intercept 
factor. The fact that it is midway between y2 
and y3 amounts to an interpolation but this is 
reasonable given our choice of a linear growth 
model. Another good choice for factor load- 
ings could be —3, —2, —1 and 0 which would 
shift the intercept to representing true indivi- 
dual differences at the last point of observation. 
The choice of scaling for the slope, and hence 
the interpretation of the intercept, can be deter- 
mined by the substantive goals of the research. 
Choosing factor loadings such that the inter- 
cept is beyond the range of the observed data 
is less desirable because interpreting the results 
becomes more problematic. For example, sup- 
pose the factor loadings were 1, 2, 3, and 4. 
The intercept factor in this case represents true 
individual differences at time 0 which is 1 year 
prior to when any data were collected. Extra- 
polating linear models beyond the range of the 
data is risky and is not usually advisable. 

We can make all of this more precise by 
writing down the structural equations for the 
model in Figure 31.4 for the ith individual. 
There is one equation for each dependent 
variable, y1 to y4, 


y1; = 1; + 02; + €1; 
y2;= 71,4172; + €2; 
y3;= 11; +272; + €3; 
V4; = M1; + 32; + €4;. 


Note how the intercept, y1, makes a constant 
contribution to the y scores but the slope, 
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142, makes a contribution that is scaled by the 
fixed factor loadings according to the linear 
passage of time. Note too that 11 just represents 
individual differences at y1, corrected for time- 
specific influences. In the most elementary and 
standard LGC, and the only one discussed in 
this chapter, the latent variables are all assumed 
to be jointly multivariate normally distributed, 
which implies that the y’s are all multivariate 
normally distributed. LGC methods, however, 
have been extended to accommodate a variety 
of different types of variables (e.g., dichoto- 
mous, ordered categorical, count, and censored 
variables; see Muthen and Muthen, 2004). 

Now that we have specified the model, the 
next task is to show that the model parame- 
ters are identified.* The means? of the y vari- 
ables are 


E(y1) = E(y1 +072 + €1) 
= E(n1) (2) 


Sal 


E(y2) = E(n1+172+ €2) 
= E(m1) + E(2) (3) 
=al-+az2. 


From equation (2) it is easy to see that a1 is 
identified and in fact is just equal to the mean 
of y1. Given that a1 is identified, then a2 is 
easily identified from equation (3) and in fact 


‘Identification is the process of demonstrating that 
the available data is sufficient to uniquely determine 
each parameter in the model. For LGC, the available 
data consists of means, variances, and covariances of 
the observed variables. Thus, identification is show- 
ing that the model parameters can be uniquely deter- 
mined by the means, variances, and covariances of 
the observed variables. 

5Upper case E indicates the mathematical operation 
of computing expectation or the mean of a random 
variable. For a review of the rules of expectation, see 
Kirk (1982). 


528 Handbook of LongitudihdP Rega’ https:/jafrilibrary.com 


is just the mean of y2 minus the mean of y1. 
The mean structure does not involve any other 
model parameters so we must turn to the covari- 
ance structure to identify the variances and 
covariances of the latent variables.® The vari- 
ance of y1 is 
var(y1) = var(71+ 1) 
= var(n1) +var(e1) + 2cov(71, 1) (4) 


= var(7n1)+var(e1). 
The covariance of y1 with y2 is 


cov(y1, v2) =cov(n1+ 61, n1+ 72+ €2) 
= cov(7n1, 71) + cov(71, 72) 
+ cov(71, €2)+cov(e1, 71) (5) 
+ cov(e1, 72) + cov(e1, €2) 
= var(71) + cov(71, 72). 


The covariance of y1 with y3 is 


cov(y1, v3) = cov(y1+ 61, 1+ 272+ €3) 
= cov(n1, 71) + 2cov(n1, 72) 
+cov(n1, €3)+cov(e1,n1) (6) 
+ 2cov(e1, 72) + cov(e1, €3) 
= var(71) + 2cov(n1, 72). 


If we subtract equation (6) from equation (5) 
we get 
cov(y1, v3) — cov(y1, v2) 
= var(71) + 2cov(71, 72) — var(1) 
+ cov(71, 2) 
=cov(71, 72), 


®The var and cov operators indicate mathematical 
operations of computing variance or covariance for 
random variables. For a review of the rules of vari- 
ance or covariance, see Kirk (1982). 


which identifies the covariance of 1 and 72, 
12. Back substituting the value of the covari- 
ance of 71 and 2 into equation (5) we get 


cov(y1, v2) = var(n1) + cov(71, 72) 
= var(n1)+ cov(y1, y3) 
—cov(y1,y2) 
2cov(y1, v2) —cov(y1, y3) = var(n1), 


which identifies the variance of y1. The vari- 
ance of 1 is then easily identified by equa- 
tion (4). The covariance of y2 with y3 is 


cov(y2, v3) =cov(y1+ 724+ €2,71 +272 + €3) 
= cov(71, 71) + 2cov(71, 72) + cov(71, €3) 
+ cov(72, 71) + 2cov(72, 72) + cov(72, €3) 
+ cov(€2, 71) + 2cov(e2, 72) + cov(e2, €3) 
= var(n1) + 3cov(n1, 72) + 2var(72). (9) 


With the variance of y1 and covariance of 41 
and 72 already identified, equation (9) identi- 
fies the variance of n2. The variance of y2 is 


var(y2) = var(y1+ 72+ €2) 
= var(n1)+var(7n2)+var(e2) (10) 
+ 2cov(71, 72). 


The variance of e2 is the only unidentified 
parameter in equation (10) so the variance of €2 
is identified. Similarly, the variances of y3 and 
y4 identify the variances of ¢3 and «4. Thus, 
the model is completely identified. It is in fact, 
overidentified because we have 14 total degrees 
of freedom from the 4 means, 4 variances and 
6 covariances for the y’s and 9 estimated para- 
meters leaving 5 degrees of freedom to test the 
fit of the model. The entire mean and covari- 
ance structure is shown below in Table 31.1. 
To demonstrate growth curve modeling, we 
will model growth in Deviant Peer Affiliation 
(DPA) in the Oregon Youth Study (OYS) from 
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Table 31.1 Mean and covariance structure for 4 points in time linear growth model 


y1 y2 y3 y4 
yl Wi t+ On 
y2 Wy +, Wy t+, +2V,, + O2, 
y3 Wy, +2, Wy + 2V. + 3Vp Wy +4 Wy, +4V2 + O55 
y4 Wi, +3V,, W, +3, +4V, W,, +6W,, +5, W,, +9W,, +6V,, + O44 
Means ay a, +a, a, + 2a, a, + 3a, 


Notes: Wy, = var(m,), Yo. = var(n,), Vig = Cov(M. Nz), 0, = E(m), 02 = E(m),O,, = var(e,),O.2 = var(s,),O33 = var(ss), 


©x4 = var(e,) 


Grade 4 to 8. The OYS started in 1984 with 
10-year-old boys and their families and is a pre- 
dominantly white, working-class sample from 
high-risk neighborhoods in a mid-size city in 
Oregon. Details on the sample and measures can 
be found in Stoolmiller (1994). DPA has been 
implicated as a key proximal predictor of delin- 
quent behavior in adolescence from a number 
of theoretical perspectives. Thus it is interest- 
ing to determine how DPA develops prior to 
adolescence and identify childhood predictors 
of growth in DPA. Such predictors may in turn 
be important targets for interventions aimed at 
reducing adolescent delinquent behavior. One 
such set of predictors is father variables and 
in particular father socioeconomic status (SES). 
Thus, our example will be limited to the subset 
of 141 boys in the OYS who had a father figure 
available to participate in the project at grade 
4. We measured DPA in the OYS by child self- 
report, teacher report, and parent report taken 
when the boys were in grades 4, 6, 7, and 8. 
The two-year gap between the first two assess- 
ments departs from a simple linear progression 
but this can easily be adjusted in the model by 
using similarly scaled factor loadings such as 0, 
2, 3, and 4. 

The parent and teacher measures were indi- 
vidual items measured on a 3 point scale of 0, 
1 and 2. The child report items, however, were 
measured on 5 point scales of 1, 2, 3, 4, and 5. 
Arbitrary scaling differences such as these can 


result in the measure with the largest scale dom- 
inating the final construct score. To prevent 
this, a simple recoding scheme was employed 
that preserved the raw information necessary to 
model change over time. The child items were 
recoded to have zero as a bottom scale anchor 
by subtracting 1 and then were rescaled to have 
an upper anchor of 2 by multiplying by 1/,. 
With this adjustment, the final DPA construct 
score was the average of the parent, teacher, 
and child measures, square root transformed 
to reduce skewness. For modeling purposes, 
the DPA scores were also multiplied by 100 to 
provide a more convenient scaling. A common 
practice with this type of data is to standardize 
all items at all points in time to a mean of zero 
and variance of 1 before computing construct 
scores. Clearly, this practice will not work for 
growth curve analysis because all information 
about mean level change and most of the 
information about variance shifts is discarded. 

The observed DPA trajectories are displayed 
in Figure 31.5, sorted by average level and then 
linear trend within average level. The sorting 
helps to group similar trajectories in the same 
subpanel and leads to a better display that can 
be helpful in determining the basic functional 
form that will be necessary to model the trajec- 
tories. In this case, it appears as though straight 
lines would do a reasonably good job of rep- 
resenting the individual trajectories. It is also 
apparent that a number of subjects increased 
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Deviant Peer Affiliation 


100 


Grade 


Figure 31.5 Observed individual growth curves for Deviant Peer Affiliation (DPA) from grade 4 to 8 


substantially over time (subplots in the bottom 
two rows, far right column) and some subjects 
decreased over time (subplots in the middle 
two rows, far left column). Thus, there would 
appear to be individual differences in change 
to be explained. 

The scatterplot matrix of the 4 repeated 
assessments is shown in Figure 31.6. The matrix 
has normal quantile plots on the main diago- 
nal to help determine if the variables follow a 
Gaussian distribution. As can be seen, there is 
some evidence of a floor effect, a number of 
cases all clumped at the bottom of the scale, at 
each time point, but the number is not exces- 
sive. In addition, skewness (denoted sk in the 
top margin of the normal quantile plot) is mini- 
mal and kurtosis (denoted k in the top margin of 
the normal quantile plot) is modest but consis- 


tently negative indicating lighter tails than the 
Gaussian distribution. No consistent nonlinea- 
rities appear in any of the scatterplots as indi- 
cated by the scatterplot smoother (solid line), 
which tracks the fitted linear regression line 
(dashed line) quite closely. Thus, there is no 
indication in the data that modeling should not 
proceed. 

The means and standard deviations, which 
are printed in the top margin of the normal 
quantile plots (m denotes mean, sd denotes 
standard deviation), increase steadily from 
grade 4 to 8. Individual differences in linear 
trends imply changing variances over time so 
the fact that the standard deviations change 
(i.e., increase) suggests that there are individual 
differences in change to explain, consistent 
with the plot of the trajectories in Figure 31.5. 


DPA4, N= 141 


DPA4 


DPA4 


DPA4 


M=46.55, Sd=28.45, Sk =0.22, 
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r=0.53, B=0.62, t=7.3, p=0, 


r=0.57, B=0.67, t=8, p=0, 
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r=0.53, B=0.68, t=7.3, p=0, 
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Figure 31.6 Scatterplot matrix for repeated assessments of Deviant Peer Affiliation, grade 4 to 8 (M = mean, 
sd = standard deviation, sk = skewness, k = kurtosis, r = correlation, b = regression weight, t = t test, p = p 
level of t test, N = sample size) with normal quantile plots on the main diagonal 
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The result of fitting 
growth curve model: 


the following linear 


yil;=71;+ €1; 

y2;= 11; 4+ 292; + €2; 
y3;= 11; + 392; + €3; 
y4,= 71; + 42; + €4; 


is shown in Table 31.2 (labeled model 1) along 
with two additional growth models that are 
simplifications of model 1. Note that the factor 
loadings were set to match the spacing of the 
assessments. In addition, the variances of the 
time-specific influences were constrained to be 
equal, a common simplifying assumption. 

The model 1 chi-square is 3.51 with 8 degrees 
of freedom (p .90) and the nonsignifi- 
cant p value indicates that the model implied 
covariance matrix and mean vector adequately 
reproduces the observed counterparts. The chi- 
square is a “badness of fit” test statistic that 
for a fixed sample size gets bigger as the model 


and data become increasingly discrepant. For 
a fixed degree of model fit, the chi-square also 
gets bigger as the sample size increases much 
the same as a t test for a difference in means 
increases with sample size. Thus even small 
discrepancies between the model and the data 
become “significant” given a large enough sam- 
ple. A number of fit indices that are not func- 
tions of the sample size are also shown in 
Table 31.2, the comparative fit index (CFD, 
the Tucker-Lewis fit index (TLI) and the root 
mean square error of approximation (RMSEA). 
The CFI and TLI will both be close to 1 for 
models that fit well and both are essentially 1 
for model 1. The RMSEA will be .05 or less for 
models that fit well and it is essentially zero for 
model 1. By all indications, model 1 provides 
an excellent fit. 

The parameter estimates are also shown in 
Table 31.2. The variance of the intercept is 
strongly significant (critical ratio greater than 
1.96 or less than —1.96 implies p <.05 by a 
two-tailed test) but the variance of the slope 


Table 31.2 Parameter estimates (Est.), standard errors (SE) and critical ratios (CR) for DPA growth models 


Model 1 
Est. SE 

Means 

DPA intercept 46.26 2.30 

DPA slope 1.93 0.64 
cov(DPA intercept, DPA slope) 35.87 19.89 
Variances 

DPA intercept 440.05 92.51 

DPA slope 14.99 7.69 
Residual variances 

DPA4-DPA8 367.91 31.19 
Chi-square 3.51 
DF 8.00 
P value 0.90 
CFI 1.00 
TLI 1.01 
RMSEA 90% CI (lower limit, 0.00 0.00 


estimate, upper limit) 


CR 


20.11 
3.02 
1.80 


4.76 
1.95 


11.80 


0.04 


Model 2 Model 3 
Est. SE (GR Est. SE CR 
46.26 2.42 19.14 46.28 2.67 17.34 
1.92 0.67 2.86 1.90 0.58 3.28 
531.77 85.08 6.25 662.44 91.71 7.22 
22.80 6.74 3.39 
351.34 27.31 12.86 411.73 28.51 14.44 
6.36 24.09 
9.00 10.00 
0.70 0.01 
1.00 0.95 
1.01 0.97 
0.00 0.00 0.07 0.05 0.10 0.15 
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is only marginally significant. This is not par- 
ticularly surprising for the intercept because it 
represents true individual differences at time 
zero which is the first assessment point. For 
the slope, however, the marginally significant 
variance implies that strong evidence is lack- 
ing for individual differences in change beyond 
what would be expected by sampling variation 
alone (i.e., that are potentially explainable).’ 
The covariance of the slope and intercept is not 
significant although it standardizes to a fairly 
substantial correlation of .44, a point we will 
return to shortly. This means that there is no 
association between where a boy starts in grade 
4 and how he will change over time. For any 
given starting point, he is just as likely to go up 
as to go down. 

The mean of the intercept factor is 46.3 and 
significant but again this is not particularly 
interesting because it is essentially a test that 
the mean of the observed DPA measure at the 
first assessment is not zero. Unless normative 
data were available for these DPA measures, 
this is not a particularly useful fact. The mean 
of the slope factor, however, is 1.93 and sig- 
nificantly different from zero. In other words, 
although some individuals are going up and 
some are going down, overall for the entire sam- 
ple, there is an increase in DPA from grade 4 
to 8 of about 1.93*4 = 7.72. This suggests that 


"Tf the test for the variance of the slope was not sig- 
nificant, and it is good to keep in mind that tests of 
variance parameters being zero in SEM tend to be 
biased against the nonzero alternative (see Pinheiro 
and Bates, 2000, pp. 84-87), it would suggest that 
the apparent change visible in Figure 31.4 is due to 
chance variation and may not be explainable. It is 
still possible that small amounts of slope variation 
that are apparently nonsignificant by a biased sta- 
tistical test are nonetheless due to some systematic 
cause. If it was hypothesized a priori and verified 
that some predictor significantly predicted the Slope 
factor, this would be evidence against the hypothesis 
of chance variation. 
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for the population, as boys make the transition 
from elementary to middle school, they tend 
to associate with peers who are more deviant. 
The mean shift of 7.72 from grade 4 to grade 8 
is about .37 of the standard deviation of the 
intercept, suggesting a medium effect size for 
population level growth. 

The residual variance is 367.91 at each point 
in time which implies that the intercept and 
slope factors account for 55, 64, 68, and 72% 
of the variance of the observed measures at 
the first through fourth points of assessment 
respectively. If the variance of the observed 
measures is increasing due to the individ- 
ual differences in change over time, and if 
the residual variances are constrained to be 
equal across time, then the R? for the individ- 
ual observed variables must go up. Intuitively, 
this means that the growth process is creating 
increasing amounts of true score variance rel- 
ative to error variance, so the reliability of the 
measures, the R?, must go up. 

Although the estimates all seem reason- 
able, are within the admissible parameter 
space,® and do not show troublesome levels of 


’The admissible parameter space refers to the set of 
parameters that make statistical sense, that is, non- 
negative variances and correlations between —1 and 
1. Most SEM programs will return values outside of 
this range if they happen to maximize the fit of the 
model to the data and these are usually referred to as 
improper solutions. An improper solution may indi- 
cate a serious problem such as an unidentified or 
badly misspecified model so they should be carefully 
investigated. On the other hand, an improper solu- 
tion may also arise because of sampling variation and 
a population parameter that is truly close to or on 
the boundary of the admissible parameter space. In 
this case, the parameter should not be significantly 
different from the boundary value. Keep in mind, 
however, that standard tests involving boundary val- 
ues for variance parameters can be biased against the 
non-boundary alternative, especially in modest sam- 
ples. See Pinheiro and Bates (2000, pp. 84-87) for 
more discussion. 
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confounding (the largest correlation among the 
parameter estimates, which are not shown, was 
—.50), the fact that the slope variance is only 
marginally significant and the intercept—slope 
correlation despite being .44 is not significant 
merits a closer look. Model 2 in Table 31.2 
shows the result of dropping the intercept— 
slope covariance from the model. A nested 
chi-square test comparing model 2 to model 
1 is 2.85 with 1 degree of freedom, p = .09, 
indicating that dropping the intercept—slope 
covariance does not cause a significant degra- 
dation in fit consistent with lack of significance 
of the same parameter in model 1. Interest- 
ingly, the slope variance in model 2 is now 
substantially larger (about 50% increase), the 
slope variance standard error drops slightly and 
the critical ratio, 3.39, is now strongly signif- 
icant, p < .001. Model 3 shows results when 
the slope variance is dropped from the model 
and the nested chi-square test for comparing 
model 2 to model 3 is 17.73 with 1 degree 
of freedom, highly significant and consistent 
with the strong significance of the same param- 
eter in model 2. From this series of model 
fits, we see that model 1 is misleading because 
the data is somewhat consistent with a positive 
intercept-slope covariance (e.g., the increasing 
covariances between grade 4 DPA and grades 
6, 7, and 8 DPA can only come from a positive 
intercept-slope covariance, see Table 31.1) but 
the remaining covariances are not big enough 
to support the significance of the intercept— 
slope covariance and the slope variance simul- 
taneously. Once the intercept—slope covariance 
is eliminated, however, the slope variance is 
strongly significant. Fortunately, dropping the 
slope variance and keeping the intercept—slope 
covariance does not make sense because vari- 
ation is a prerequisite for covariation, so the 
choice among models 1, 2, and 3 is straight- 
forward and we will build on model 2 for the 
rest of the models in this chapter. 

Building a growth model is the first step 
toward answering the question posed earlier 


about why some subjects go up, others go down, 
and still others stay the same across time. 
Adding covariates to the model helps answer 
the question. When the covariates are measured 
at or before the first time period, a significant 
effect on the slope is powerful evidence that 
they are implicated in the growth process. If 
the covariates are measured somewhere in the 
middle or towards the end of the developmen- 
tal period under study, it becomes less clear 
whether they are a determinant or consequence 
of growth. For DPA, family, child and parent 
attributes are obvious choices as determinants 
of future growth. For our purposes here, we 
will consider grade 4 covariates of the boy’s 
academic skill and the father’s SES as poten- 
tial predictors. Before we actually fit the model, 
we first examine the structural equations and 
check for identification problems. 

Suppose the basic growth curve model, illus- 
trated in Figure 31.4, is augmented to 


y1,;=71;+072,;+ €1,; 
y2;= 1; +172; + €2; 
y3; = 11; + 292; + €3; 
y4; = 11; + 392;+ €4; 
ni, =111x1,;+21;+a1 
2; =121x1;+22;+ a2 


The model is shown below in Figure 31.7. 
Notice now that the intercept and slope factors 
have become dependent variables, predicted by 
x1, and both now have residual variances and 
intercepts. There is also a covariance between 
the growth factor residuals to reflect other 
unmeasured causes that might cause the inter- 
cept and slope factors to be correlated. We have 
previously shown that the means and variances 
of the intercept and slope factors were identi- 
fied. If we can show that any new parameters 
that go into making up the means and variances 
of the intercept and slope factors are identified 
(i.e., [11 and [21), we can use our previous 
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Figure 31.7 Path diagram for linear growth curve 
model with pre-existing predictor of intercept and 
slope 


results to argue that the rest of the model is 
identified once [11 and [21 are identified. 

The mean and variance of x1 are of course 
just taken as their sample values since x1 is 
an independent variable. The covariance of x1 
with y1 is 


cov(x1,y1) =cov(x1,71+072+ £1) 

= cov(x1,111x1+ 1+a1+61) 

= cov(x1,111x1+ /1+ 61) 

= cov(x1,111x1) + cov(x1, £1) 

+cov(x1,¢é1) 

= [11var(x1) (13) 
In step 2 of equation (13), the definition of 
741 was substituted in and 42 was dropped 
because it is multiplied by zero. In step 3, a1 
was dropped since it is a constant and there- 


fore cannot contribute to covariation. What 
equation (13) shows is that [11 is identified 
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because var(x1) is a given. The covariance of 
x1 with y2 is 


cov(x1, y2) = cov(x1, n1+ 92+ €2) 

= cov(x1,111x14+ 1+a1+4+121x1+4+ ¢2 
+a2 + €2) 

= cov(x1,111x1+ €1+1T21x1+4+ (2+ 62) 

= cov(x1,111x1) + cov(x1, 1) 
+cov(x1,121x1) + cov(x1, £2) + cov(x1, €2) 

= I11var(x1)+0+T21var(x1)+0+0 

= ((11+121)var(x1) (14) 


As in the previous equation, we substituted the 
definitions of m1 and 72 in and dropped the 
constants. Now, since var(x1) is a given and I11 
is already identified, [21 is identified. Thus 
the model is completely identified and in fact, 
overidentified because we have 5(5+3)/2 = 20 
total degrees of freedom minus 13 estimated 
parameters, leaving 7 degrees of freedom to 
test the model. 

Returning now to our empirical example, 
Figure 31.8 shows grade 4 academic skill and 
father SES against each subsequent DPA mea- 
sure. This is a powerful graphical technique for 
identifying early predictors of future change. 
This can be seen by computing the regres- 
sion weight for the early predictor against each 
repeated assessment of the outcome assuming 
the model in Figure 31.7 holds. These regres- 
sion weights are the covariance of the outcome, 
each successive DPA measure, y1 through y4, 
and the predictor, x1, divided by the variance of 
the predictor, x1. The regression weight for y1 is 


cov(x1,y1) | ['11var(x1) 


var(x1) =itd, 115) 


B = 
ore var(x1) 


The regression weight for y2 is 

cov(x1,y2) | ({11+121)var(x1) 
var(x1) (16) 

=[11+1721 


B 


x1,y2 = 


var(x1) 
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Figure 31.8 Scatterplots of grade 4 academic skill and father SES versus repeated assessments of DPA, grade 4 to 8 
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To get the regression weight for y3, we first 
compute the covariance of x1 and y3. 


cov(x1, y3) = cov(x1,71+272+ 63) 

= cov(x1,111x1+¢1+a1+2921x1 
+2¢242a2+4 63) 

= cov(x1,P11x1+ ¢1+2121x1+ 2¢2 + €3) 

= cov(x1,111x1) + cov(x1, ¢1) 
+cov(x1, 2P21x1) + cov(x1,2@2) 
+cov(x1, 63) 

= [11var(x1)+0+ 2f21var(x1)+0+0 

= (11+ 2121)var(x1) (17) 


The regression weight is 


cov(x1,y3) | ((11+2P21)var(x1) 


Baws = 
mee var(x1) 


x 


var(x1) 


=111+2F21 (18) 


To get the regression weight for y4, we first 
compute the covariance of x1 and y4. 


cov(x1, y4) = cov(x1, 71+ 372+ €4) 

= cov(x1,111x14+ 21+ a1+ 39F21x1+ 32 
+3a2 +4 &4) 

= cov(x1,111x14 £1439 21x14 3£2+4 64) 

= cov(x1,111x1) + cov(x1, ¢1) 
+cov(x1, 3f21x1) + cov(x1, 322) 
+cov(x1, 64) 

= [11var(x1)+0+ 3f21var(x1)+0+0 

= ((11+3121)var(x1) (19) 
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The regression weight is 


cov(x1,y4)  (%11+3121)var(x1) 
By ya = = 


var(x1) var(x1) 


=111+4+3121 (20) 


It is easy to see that the bivariate regression 
weights start at [11 and then grow linearly over 
time by the amount [21 for each unit of time 
if the model in Figure 31.7 holds. Of course, 
we would expect sampling variation to mask, 
somewhat, the nice linear progression but over- 
all the regression weights should tend on aver- 
age to change linearly. If the regression weights 
do not systematically increase it suggests that 
[21 is zero or, in other words, the variable 
does not predict linear growth. If the weights 
seem to change in some systematic pattern but 
not linearly, it suggests that the predictor pre- 
dicts nonlinear growth. If the regression at each 
point in time is nonlinear but changes syste- 
matically over time, it suggests the predictor 
has a nonlinear relation to linear or possibly 
nonlinear growth in the outcome. Clearly, these 
more complicated patterns will be difficult to 
discern with only 4 points in time unless the 
relation is a very strong one. 

Returning now to Figure 31.8, the linear effect 
of boy academic skill is negative in direction 
and becomes stronger from grade 4 to 6, but 
then weaker at grade 7, and then stronger again 
at grade 8. The relation also appears to become 
increasingly nonlinear over time. The apparent 
nonlinear effect is very interesting and could 
signal a threshold effect or perhaps a nonaddi- 
tive interaction with some other predictor, but 
we will not pursue it here. The effect of father 
SES is negative in direction and is essentially 
linear and becomes steadily stronger over time. 
We can get a better look at how the regres- 
sion weights change over time by plotting them 
against time for each predictor. Such a plot is 
shown in Figure 31.9. As can be seen, father 
SES shows a stronger and more smoothly linear 
increase over time, which suggests that it is a 
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Figure 31.9 Regression weights versus grades for regression of repeated assessments of DPA, grade 4 to 8 on 


grade 4 academic skill and father SES 
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Figure 31.10 Ordinary least-squared slopes for DPA, grade 4 to 8 versus grade 4 predictors, academic skill 


and father SES 


better predictor of linear change in DPA than 
boy academic skill. 

Another useful plot is shown in Figure 31.10. 
Here, individual growth curves have been fit for 
each subject by regressing their 4 DPA values 
on the 4 time values, 0, 2, 3, and 4, in order 
to get individual slopes. The plot on the right 
side of the page for father SES shows a signi- 


ficant linear relation with DPA slopes (the test 
statistics are in the top margin). The plot on 
the left shows that the boy’s academic skill in 
grade 4 does not predict linear growth in DPA. 
In Figure 31.11, grade 9 academic skill is plot- 
ted against all potential predictors. The bottom 
right plot looks at the boy’s academic skill in 
grade 9 as a function of linear growth in DPA 
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Academic skill grade 9 


DPA grade 4 


Academic skill grade 9 
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Figure 31.11 Grade 9 academic skill versus grade 4 predictors and ordinary least-squared slopes for DPA, 


grade 4 to 8 


from grade 4 to 8. This plot looks highly non- 
linear on both ends of the DPA slope distribu- 
tion. The nonlinearity in the high end is due 
to 4 cases and may not be very replicable. The 
nonlinearity in the low end is more substantial 
and appears to involve almost one-third of the 
sample and begin around a DPA slope of zero. 
The plots on the bottom left and top right for 
grade 4 DPA and father SES respectively indi- 
cate significant linear relations. The plot on the 


top left for grade 4 academic skill suggests a 
nonlinear relation on the low end of the grade 4 
distribution. The apparent nonlinear relations 
are interesting and deserve more attention as 
potential challenges to existing theory if repli- 
cable, but we will not pursue them here. 

The linear growth model of Figure 31.7 was 
fit with father SES as the predictor. The results 
are shown below in Table 31.3. The model has 
an excellent fit, the chi-square is 4.63 with 11 
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Table 31.3 Parameter estimates, standard errors 
(SE) and critical ratios (Est./SE) for DPA growth 
models with grade 4 father SES (FSES4) 


Estimates SE  Est./SE 


DPA intercept on —0.79 0.20 —4.01 


father SES 4 


DPA slope on 
father SES 4 


—0.16 0.06 —2.82 


Intercepts 
DPA intercept 71.90 6.79 10.60 
DPA slope 7.07 1.94 3.65 


Residual variances 
DPA intercept 413.51 69.39 5.96 
DPA slope 16.51 5.84 2.83 


DPA4-DPA8 356.73 27.58 12.93 
Chi-square 4.63 
DF 11.00 
P value 0.95 
CFI 1.00 
TLI 1.02 
RMSEA 90% CI 0.00 0.00 0.01 
(lower, estimate, 
upper) 


degrees of freedom which generates a p value of 
.95. The CRI, TLI and RMSEA are also indica- 
tive of an excellent fit. The effect of father SES 
on both intercept and slope factors is highly 
significant, (z = —4.01 and z = —2.82, respec- 
tively) and accounts for 16% of the variance 
for both the intercept and slope factor. The fact 
that father SES predicts the intercept and slope 
factors of DPA means that boys with low SES 
fathers tend to both start higher in 4th grade 
on DPA and grow faster from 4th to 8th grade 
on DPA than boys with high SES fathers. The 
regression intercepts for the intercept and slope 
factors are not really interesting because they 
represent mean levels for boys with father SES 
scores of zero, which is beyond the range of the 
observed data in the OYS. 

The last step in our introduction to growth 
curve analysis will be to examine the impact 
of growth in DPA on future outcomes, in 


particular academic skill in grade 9, one year 
after the growth interval. Here, we attempt to 
assess the consequences of a particular pattern 
of growth in DPA. Does a high rate of growth 
on DPA during middle school have a nega- 
tive impact on academic skill during the first 
year of high school, over and above initial sta- 
tus of DPA? If so, then it immediately raises 
the question of whether father SES might also 
have an impact on academic skill since we have 
seen that father SES has a significant impact on 
growth rates of DPA. We can examine whether 
or not the father SES effect is mediated through 
growth in DPA or if it has a direct effect on 
academic skill in grade 9. 

First, consider the model shown in 
Figure 31.4 and suppose we add another y 
variable, y5, academic skill in grade 9. The new 
model is shown in Figure 31.12. Notice that 
the path diagram has been altered compared 
to Figure 31.4 and in particular it has been 
simplified by eliminating the circles for the Cs 
and e5. As models become more complicated, 
it usually becomes necessary to start leaving 
out parts of the path diagram in order to clearly 
communicate the essential aspects of the 
model. The intercept and slope factor loadings 
on y5 are now freely estimated regression 
weights since academic skill is a conceptually 
distinct outcome of growth and not part of the 
DPA series. In addition, unlike y1 to y4, y5 
has a nonzero regression intercept indicated 
by v5. 

The first question is, are these loadings 
identifiable? The structural equations for the 
model are 


y1,=71,;+072;+ €1; 
y2,;=1,+172;+ €2; 

y3;= 1; +2n2;+ 23; (21) 
y4;=71;+3n2;+ €4; 

V5; = A511; + A522; + €5;+ V5. 
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Figure 31.12 Path diagram for linear growth curve 
model with intercept and slope predicting a distal 
outcome 


The covariance of y5 with y1 is 


cov(y1,y5) 
= cov(y1+¢1,A5,91+As.n2+ 65+ 05) 
= cov(71, A;,71) + cov(71, A;.72) 
+cov(71, 65) +cov(el, A;,71) 
+cov(e1, A;,2) + cov(el, €5) 
= As, var(n1) +A;.cov(71, 72). (22) 


The covariance of y5 with y2 is 


cov(y2,y5) 
=cov(n1+ 92+ €2,A5,n1 +A5.n2+65+0;) 
= cov(71, A5,91) + cov(N1, A522) 
+cov(n1, €5) + cov(e2,A;,71) 
+cov(é2,A;,72) + cov(e2, €5) 
+cov(72,A;,71) + cov(72, A;.72) 
+cov(72, 5) (23) 
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= A,,var(n1) +A,;,cov(71, 72) 
+A;,cov(72, 71) + A;,var(72) 

= A,,[var(71) + cov(72,71)] 
+A,,[cov(71, 72) + var(72)]. 


We already know from equations (6) to (10) 
that the variances of n1 and 72 and the covari- 
ance are identified so equations (22) and (23) 
appear to be two independent linear equations 
in the two unknowns, \51 and 52. By apply- 
ing the determinant test? with al1 = var(n1), 


®The determinant test is fairly easy and useful to use 
for two linear equations with two unknowns. The 
two equations will be independent and have a unique 
solution if the determinant test is not zero. Suppose 
we have two unknowns, x and y, and two equations 
for x and y 


al1x+al2y=cl 
a21x+a22y =c2 


The determinant is a11 a22—a21 a12 and so long 
as the determinant is not zero, the equations have a 
unique solution. For LGC, the coefficients (the a’s) 
will be elements from either the observed mean vec- 
tor, the observed covariance matrix, or previously 
identified model parameters. As noted in the main 
text, the determinant test is zero if the Slope variance 
is zero and as also previously noted, if our running 
example model includes the covariance between 
the Intercept and Slope, the Slope variance is only 
marginally significant. Being marginally significant 
means that the Slope variance is close to zero, prob- 
abilistically speaking, which means the model is 
close to being unidentified. When a model is close 
to being unidentified, the parameters involved fre- 
quently have highly inflated standard errors, which 
indicates that the data do not very precisely estimate 
the parameters in question and this does indeed hap- 
pen to the effects of the Intercept and Slope on grade 
9 academic skill when the Intercept-Slope covari- 
ance is included in the model. Both have highly 
inflated standard errors and neither is significant 
until the Intercept-Slope covariance is removed from 
the model as shown in Table 31.4. 
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a12 = cov(n1,n2), a21 = var(y1) + cov(y1,72), 
and a22 = cov(n1,y2) + var(n2) we find 


ai11a22—a12a21 
= var(71)[cov(71, n2) + var(72)] 
—cov(71, 72)[var(71) + cov(71, 72)] 
= var(n1)cov(71, 72) + var(n1)var(72) 
—cov(71, 72)var(n1) — cov(n1, 72)? 
= var(n1)var(n2) — cov(1, 72)? 
= var(n1)var(n2) — cov(1, 72)? 
xvar(71)var(72) 
= var(n1)var(n2)[1—cor(1,72)*]. (24) 
Equation (24) is only zero when 1 and n2 are 
perfectly correlated or when either 71 or 72 
have a variance equal to zero; so long as neither 
of these things is true, 451 and \52 are identi- 


fied. 
Toshowthat v5 is identified, the mean of y5 is 


E(y5) = E(A5, 91 +A5.92 +€5+V5) 
= A5,E(1) +A52E (72) + v5 (25) 
=A;,€1+A;,a2+05. 


But, the only unknown in the entire equation 
is v5 since we have already shown that the \’s 
and the a’s are identified so v5 is identified. 
The last step is to identify the variance of ¢5 
which is easily done from the variance of y5 


var(vy5) =var(A;,n1+A;.n2+e5+v5) 
=var(As5,n1)+var(A;5. 72) +var(e5) 
+2cov(A;,71,A5.2) + 2cov(A;, 71,5) 
+2cov(A;,72,€5) 
=A2,var(n1)+AZ,var(n2)+var(e5) 
+2A5,A5,Cov(71, 72). (26) 


The only unknown in the entire equation is 
the variance of ¢5 since all other model param- 
eters are identified so it too is identified. Thus, 


the model with a future outcome predicted by 
the latent growth factors is completely iden- 
tified and, in fact, overidentified. In addition, 
we showed that the model with a predictor of 
growth could be identified from just the growth 
model itself. Thus, the model with both early 
predictors of growth and future consequences 
of growth is completely identified. If the early 
predictors of growth also have direct effects on 
the future consequences, these effects will also 
be identifiable because this part of the model is 
just observed variable regression. 

Returning now to our empirical example, 
Table 31.4 below shows results from the growth 
model with grade 9 academic skill included as 
future consequence of growth. The effects of the 
intercept and slope of DPA on academic skill 
are strongly significant and jointly account for 


Table 31.4 Parameter estimates, standard errors 
(SE) and critical ratios (Est./SE) for DPA growth 
models with grade 9 academic skill 


Estimates SE _ Est./SE 
Academic skill 9 —0.02 0.00 —7.06 
on DPA intercept 
Academic skill 9 —0.09 0.02 —4.15 
on DPA slope 
Means 
DPA intercept 46.27 2.42 19.13 
DPA slope 1.96 067 2.92 
Intercepts 
Academic skill 9 1.16 0.14 8.42 
Variances 
DPA intercept 533.93 85.36 6.26 
DPA slope 22.89 6.75 3.39 
Residual variances 
DPA4-DPA8 350.95 27.27 12.87 
Academic skill 9 0.31 0.05 6.02 
Chi-square 9.01 
DF 11.00 
P value 0.62 
CFI 1.00 
TLI 1.01 
RMSEA 90% CI (lower, 0.00 0.00 0.08 


estimate, upper) 
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58% of the variance of academic skill. High 
scores on either initial status in grade 4 or 
growth rate from grade 4 to 8 of DPA, or both, 
substantially lower grade 9 academic skill. 
Finally, we combine both the early predic- 
tors and future consequences in one overall 
model, which is shown in Figure 31.13. Results 
are shown in Table 31.5. The first four lines 
show the effects of the growth factors and the 
grade 4 predictors on grade 9 academic skill. 
The DPA slope factor and grade 4 academic 
skill have the strongest effects and both stan- 
dardized effects (—.44 and .48 respectively) are 
about equal in magnitude. The effect of the 
DPA intercept factor is just barely significant 
at the .05 level (standardized effect = —.21) 
and the effect for father SES is nowhere close. 
The stability effect of early academic skill on 
later academic skill is not surprising but the 
fact that both DPA intercept and slope pre- 
dict academic skill at grade 9, and in fact, the 
standardized DPA slope effect (—.44) is larger 
than the DPA intercept effect (—.21) highlights 
the importance of considering change over 


Figure 31.13 Path diagram for linear growth curve 
model with two pre-existing predictors of intercept 
and slope and pre-existing predictors, intercept and 
slope predicting a distal outcome 
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Table 31.5 Parameter estimates, standard errors 
(SE) and critical ratios (Est./SE) for DPA growth 
models with grade 4 academic skill, father SES and 
grade 9 academic skill 


Estimates SE _ Est./SE 


DPA intercept on —14.16 2.58 —5.49 


academic skill 4 


DPA intercept on —0.44 0.19 —2.33 
father SES 4 
DPA slope on —0.13 0.82 —0.16 
academic skill 4 
DPA slope on father —0.16 0.06 —2.63 
SES 4 
Academic skill 9 on —0.01 0.00 —2.13 
DPA intercept 
Academic skill 9 on —0.09 0.03 —3.39 
DPA slope 
Academic skill 9 on 0.50 0.09 5.90 
academic skill 4 
Academic skill 9 on 0.01 0.01 0.74 
father SES 4 
Intercepts 
Academic skill 9 0.37 0.28 1.32 
DPA intercept 61.13 6.44 9.49 
DPA slope 7.09 2.04 3.48 
Residual variances 
DPA4-DPA8 355.42 27.14 13.10 
Academic skill 9 0.19 0.05 4.07 
DPA intercept 285.77 54.56 5.24 
DPA slope 17.01 5.58 3.05 
Chi-square 11.75 
DF 15.00 
P value 0.70 
CFI 1.00 
TLI 1.01 
RMSEA 90% CI 0.00 0.00 0.06 
(lower, estimate, 
upper) 


time on DPA. The predictors jointly account 
for 75% of the variance of grade 9 academic 
skill. Both father SES and the boy’s aca- 
demic skill at grade 4 have significant effects 
on the intercept of DPA although the father 
SES effect is marginal. Only father SES has 
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a significant effect on the slope of DPA and 
academic skill does not. The father SES effect is 
consistent with results in Table 31.2; including 
grade 4 academic skill in the prediction of the 
DPA slope does nothing to change those results. 
There is a significant .34 correlation between 
father SES and academic skill at grade 4. 

The model in Table 31.5 also has indirect 
effects to consider. Recent work indicates that 
for indirect effects the sampling distributions 
for test statistics are not normally distributed as 
was commonly assumed, especially in samples 
that are not large (MacKinnon et al., 2002). To 
compensate for non-normality and asymmetry 
we use a 95% bias corrected confidence interval 
generated by a bootstrapping approach avail- 
able in Mplus for the grade 4 predictors. Father 
SES has significant indirect effects through both 
the DPA intercept (lower limit = 0.001, esti- 
mate = 0.004, upper limit = 0.011) and slope 
(lower limit = 0.005, estimate = 0.013, upper 
limit = 0.034) on grade 9 academic skill. Aca- 
demic skill at grade 4 has a significant indirect 
effect through the DPA intercept (lower limit = 
0.001, estimate = 0.115, upper limit = 0.236) 
and a non-significant effect through the slope 
(lower limit = —0.137, estimate = 0.011, upper 
limit = 0.193) on academic skill at grade 9. 

The model in Table 31.5 has substantive 
implications for efforts to enhance the scholas- 
tic success for boys in high school. It suggests 
that early efforts to enhance academic skill in 
elementary school will have direct effects on 
later academic skill and will also tend to reduce 
deviant peer affiliation in elementary school 
which in turn will increase academic skill in 
high school. Early efforts to reduce deviant peer 
affiliation will also tend to pay off in terms 
of higher achievement in high school. A high 
priority should also be placed on preventing 
growth in deviant peer affiliation during middle 
school as this has a substantial direct effect on 
high school achievement. On a more specula- 
tive note, increasing academic skill for boys in 
high school may also have an intergenerational 


effect. If increasing academic skill for the boy 
in high school leads to higher SES when he 
becomes an adult, this would tend to reduce 
deviant peer affiliation in his own boys, assum- 
ing he marries and has boys. This in turn would 
lead to greater academic achievement and pos- 
sibly higher SES. 
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| Chapter 32 | 


Multilevel growth curve analysis for 
quantitative outcomes 
Douglas A. Luke 


Multilevel growth curve modeling is a power- 
ful and flexible statistical technique which can 
be used to model longitudinal data. The pri- 
mary purposes of growth curve modeling are to 
describe the form and structure of change in a 
quantitative dependent variable over time, and 
to explore the interindividual and intraindi- 
vidual predictors of this change. Growth curve 
modeling is a type of multilevel modeling, 
based on a mixed effects statistical model, 
which treats multiple observations as nested 
within individuals. Growth curve modeling has 
numerous statistical advantages for analyzing 
longitudinal data. In particular, it can handle 
missing data and longitudinal designs where 
observations occur at different time points 
across individuals. Growth curve models can 
be fit with any statistical software that includes 
mixed effects or multilevel modeling proce- 
dures. Multilevel growth curve modeling is one 
of the most powerful and flexible ways to ana- 
lyze longitudinal data. 


1 Introduction 


Growth curve modeling is a flexible and power- 
ful way to analyze longitudinal data. The term 
“multilevel growth curve” recognizes the fact 
that growth curve modeling is a type of mul- 
tilevel model where observations are nested 


within individual cases. The use of the term 
growth curve arises out of psychology, where 
this type of multilevel modeling was first used 
to describe developmental growth of a vari- 
ety of psychological characteristics (Bryk and 
Raudenbush, 1987; McArdle and Nesselroade, 
2002). However, as we shall see, growth curve 
modeling can be applied to any form of longi- 
tudinal data where the interest is in change— 
no formal conception of a “growth” process is 
required. 

Multilevel growth curve modeling uses a 
mixed effects general linear modeling approach 
to estimate the statistical model. Growth curve 
models can also be estimated using a latent con- 
struct approach via structural equation model- 
ing (SEM) software. This SEM approach is not 
covered in this chapter. Interested readers can 
see the excellent introduction to latent growth 
curve models by Terry Duncan and his col- 
leagues (1999). See also Stoolmiller, Chapter 31, 
in this volume. 

Multilevel growth curve analysis has a num- 
ber of important strengths. First, it allows flex- 
ible statistical modeling that can more closely 
match the underlying longitudinal theoretical 
framework. In particular, multilevel growth 
curve modeling can help disentangle ques- 
tions about interindividual predictors (e.g., Do 
children who go to pre-school show quicker 
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mastery of reading skills in elementary school 
compared to children who do not go to pre- 
school?) from intraindividual predictors (e.g., 
Does receipt of positive feedback speed up the 
acquisition of particular reading skills for indi- 
vidual children?). 

The primary purpose of this chapter is to 
introduce growth curve modeling techniques in 
an applied way so that the interested reader 
can see their potential for longitudinal data 
analysis. In the next section some basic con- 
siderations about research design and data 
management in growth curve modeling are 
discussed. Following this, longitudinal data 
from the National Longitudinal Survey of Youth 
are used to illustrate the basic steps in building 
and evaluating a growth curve model. Finally, 
the chapter concludes with a short appendix 
covering software that can be used to fit growth 
curve models. 


2 Research design and data 
management 


Growth curve models are based on longitudinal 
data from longitudinal research designs. Longi- 
tudinal data are made up of observations of one 
or more dependent and independent variables 
which are measured on the same individuals 
at multiple points in time. Technically, pre- 
post data that are obtained at two different time 
points are longitudinal. However, no real lon- 
gitudinal research questions can be addressed 
with such data. Questions about the form of 
change over time, in particular, can only be 
answered with data that are measured at three 
or more time points (Singer and Willett, 2003). 

Longitudinal data may be obtained from 
either experimental or observational studies. 
For example, a clinical trial study of the effects 
of an educational campaign designed to pro- 
mote screening for prostate cancer would col- 
lect longitudinal data from participants over 
time after enrolling in the study. The primary 
hypothesis would be that participants receiv- 
ing the new educational materials would show 


higher rates of screening over time than par- 
ticipants in a control condition. Longitudi- 
nal observational studies are also extremely 
common. For example, using health surveys 
of adolescents, investigators could track sub- 
stance use patterns over time. One longitu- 
dinal hypothesis could be that students who 
transfer schools may show steeper increases 
in substance use over time than students who 
remain in the same school. From the perspec- 
tive of the statistical analyst, there is no differ- 
ence between experimental and observational 
longitudinal data. The primary difference is 
in the interpretation of the results—e.g., much 
stronger claims for causality may be made for 
experimental longitudinal data than from obser- 
vational data. 


2.1 Data management 


Data management for growth curve and other 
types of longitudinal data analysis can become 
somewhat complicated. However, all longitu- 
dinal datasets will have certain core features. 
First, longitudinal datasets will have five basic 
types of variables: an ID variable, one or more 
longitudinal dependent variables, one or more 
variables containing time information, time- 
varying predictors, and time invariant predic- 
tors. A dataset used for growth modeling will 
always have at least the first two types, but 
the presence of the different types of predictor 
variables will depend on the study design and 
research questions. 

Although longitudinal data are often initially 
collected and stored in different data files, even- 
tually the data will be brought together for anal- 
ysis. There are two common formats for storing 
longitudinal data, illustrated in Table 32.1. 
In the “wide” data format, each record in 
the database is a separate individual. Multiple 
observations on the same individual are stored 
in different variables (e.g., weight1, weight2, 
etc.) in the same case. However, most multi- 
level software packages will expect to see lon- 
gitudinal data in a different format, where data 


Presented by: DURA atUibsAN TR Curve analysis for quantitative outcomes 547 


are stored in one observation per record. In 
this “tall” format each observation gets its own 
record, and any longitudinal data are stored 
in one variable (e.g., weight). Most general- 
purpose statistical packages provide routines 
that can relatively easily restructure the data 
from one format to the other. Notice that in the 
observation record format, the multilevel struc- 
ture of the longitudinal data is apparent: multi- 
ple observations are nested within individuals 
(see below). 


2.2 Introduction to the NLSY97 dataset 


The data used to provide examples for this 
chapter are taken from the National Longitudi- 
nal Survey of Youth 1997 Cohort (NLSY97). The 
NLSY97 is part of a series of surveys funded by 
the US Bureau of Labor Statistics and designed 
to gather longitudinal data on the labor mar- 
ket experiences of US youth and adults. The 
NLSY97 examines the transition from school to 
work for a nationally representative sample of 


youth who were born from 1980 to 1984. The 
youths were ages 12 to 17 during the first wave 
of data collection. 8984 participants were inter- 
viewed in 1997, and annual interviews were 
conducted for the next seven years. The sample 
size for round 7 was 7756, and the overall reten- 
tion rate was 86.3%. The NLSY97 collected 
information on a wide variety of educational, 
work, and health areas. With the large sample 
size, number of variables, and up to seven time 
points for each participant, the NLSY97 is an 
ideal data source for exploring growth curve 
modeling. See http://www.bls.gov/nls for more 
information. 

For this chapter, data were extracted and 
downloaded from the complete seven-year 
NLSY97 public dataset. We will be focusing 
on developing growth models for two depen- 
dent variables: BMI and Total Substance Use 
Days. BMI is the body mass index and is an 
important risk factor for a wide variety of health 
conditions related to obesity. BMI was not mea- 
sured directly in the NLSY97, but is based on 


Table 32.1 Comparison of the “individual record” (wide) and 
“observation record” (tall) data structures 


Individual record structure 


ID Gender Agel Weight1 Age2 Weight2 Age3 Weight3 

001 M 12 125 13 129 14 137 

002 =F 12 101 13 103 14 108 

Observation record structure 

ID Time Gender Age Weight 
001 I. M 12 125 
001 2 M 13 129 
001 3 M 14 137 
002 al F 12 101 
002 2 F 13 103 
002 3 F 14 108 
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self-reported measures of height and weight. 
The formula for BMI is 


Weight in pounds #08 
(Height in inches) x (Height in inches) . 


pu =( 


The NLSY97 asked youth to report the num- 
ber of days in the past 30 days that they had 
used alcohol, marijuana, or smoked cigarettes. 
We combined these three measures to form a 
total substance use risk variable, called Sub- 
stance Use Days, that can range from 0 to 
90. The higher the number, the more often 
the youth is reporting using substances in the 
past month. We will use hierarchical linear 
modeling to examine how each of these vari- 
ables change over time as youths age, and we 
will also explore how certain covariates predict 
interindividual differences in change patterns 
over time. The covariates include individual 
characteristics such as gender and race, as well 
as one important time-varying predictor, tran- 
sition to a new school. 


3 Building the multilevel 
growth model 


3.1 Framing a growth curve model as a 
multilevel model 


In a traditional regression model, variability 
of the dependent variable is accounted for 
either by the predictor variables, or else put 
into an undifferentiated individual error term. 
In a multilevel statistical model, as the name 
suggests, we are able to partition variability 
across multiple levels. So, for example, if we 
want to understand reading achievement by stu- 
dents in multiple classrooms, using a multilevel 
model we can account for variability that exists 
between students (level 1) and also variability 
between classrooms (level 2). That is, students 
are nested in classrooms, and we can build sta- 
tistical models that reflect that reality. 


Growth curve models are simply a spe- 
cial type of multilevel model. Here, multi- 
ple observations across time are nested within 
individuals. As we stated earlier, a principal 
advantage of multilevel modeling is its abil- 
ity to account for nonindependence of obser- 
vations due to nesting. So, just as we might 
expect students in the same classroom to be 
more similar to one another than would be 
expected by chance (thus violating the tradi- 
tional independence assumption), we certainly 
expect multiple observations of the same person 
to be more alike. Growth curve modeling using 
hierarchical linear models can appropriately 
account for this nonindependence of observa- 
tions across time. 

The following system of equations shows 
a basic growth curve model as a multilevel 
model: 


Yui = Boi + Bail + €4 
Boi = Yoo + Uoi 
Pui = Y10 + Ui 


Here we are modeling some dependent variable 
Y, measured at time ¢ on individual i. The first 
line of the model can be considered as the level 
1 portion of the model, and looks similar to 
a typical multiple regression model. The only 
level 1 predictor included in this basic growth 
curve model is T, which is the time variable. 
For this reason this model is sometimes called 
an unconditional linear growth curve model. It 
is unconditional in that there are no predic- 
tors, other than the time variable. It is linear 
in that 8,; only captures the linear relationship 
between time and the dependent variable. 

The most important difference between the 
above model and a traditional multiple regres- 
sion model can be seen by considering the pres- 
ence of the i subscripts in the level 1 portion of 
the model. Both the intercept and slope betas 
have i subscripts, indicating that we are allow- 
ing the intercepts and slopes to vary across indi- 
viduals. That is, each individual in the dataset 
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is allowed to have his or her own growth curve! 
For this reason, we often call the level 1 part 
of the growth curve model the intraindividual 
part of the model. 

In a multilevel model, the parameters in the 
first level of the model become outcomes in 
the second level of the model. In the uncon- 
ditional linear growth model, the intercept for 
a particular individual (8,;) is predicted by 
the grand mean of all the individual intercepts 
(Yoo) plus the variability of the individual inter- 
cepts around the grand mean (u,,;). Similarly, 
the slope for a particular individual (6,,) is pre- 
dicted by the grand mean of all the individual 
slopes (y,,) plus the variability of the individ- 
ual slopes around the grand mean (u,;). The 
level 2 part of the model is called the interindi- 
vidual part of the model, because it can be 
used to model predictors of change between 
individuals. 

Instead of using a system of equations to 
specify the multilevel model, we can substitute 
the level 2 parts of the model into the level 1 
equation. After substituting and rearranging the 
terms, we get the following: 


Yui = [Yoo + Ya0 Ti] + [Wor + Uri Tu + €xi] 
fixed random 

This single prediction equation form of the 
multilevel model is called the mixed effects 
model, because it shows how the model is based 
on both fixed effects (the gammas y) and ran- 
dom effects (the variance components « and u). 
Although it is harder to discern the multilevel 
structure of the model when it is in this form, it 
more Clearly states what components are actu- 
ally being modeled. This form of the model 
also closely corresponds to the output of the 
various multilevel modeling software packages. 
It is advantageous to be able to construct and 
interpret multilevel models using both types of 
equations. Fortunately, hierarchical linear mod- 
eling software such as HLM allows you to see 
the models in both forms. 

The unconditional linear growth model, pre- 
sented above, is only one of an innumerable set 


of possible growth curve models. This simple 
model can be extended by adding predictors at 
either of the levels of the model, as well as by 
making decisions about which random effects 
to include in the model. It is difficult at this 
point to know how to make these decisions. So 
in the next two sections we will look at how 
to extend the model by considering two funda- 
mental questions about growth curve models. 
First, how can we describe the form of change 
over time? Second, what factors influence intra- 
and interindividual patterns of change? 


3.2 Describing the form of change 


A starting point for most growth curve mod- 
els is to describe the form or shape of change 
in the dependent variable of interest. The pur- 
pose of this first step may simply be descriptive, 
or it might be to address a specific scientific 
question (e.g., “Does the increase in BMI during 
adolescence follow a quadratic form?”). 

Before jumping into fitting and testing spe- 
cific multilevel growth models, it is advisable 
to spend some time thinking theoretically about 
expected patterns of change. This can help 
guide the often complicated process of model 
selection. In addition, it is always a good idea to 
examine the data to see what the individual raw 
growth curves look like. Figure 32.1 presents 40 
randomly chosen plots of the raw growth curves 
of BMI from the NLSY97 data. It is apparent that 
BMI levels vary substantially between youth. 
However, it appears that for many of the youth, 
BMI tends to go up as they get older. Although 
the patterns are not consistent across youth, it 
appears that changes in BMI may not proceed 
in a simple linear fashion. So we might want to 
examine more complicated growth models that 
describe nonlinear change. 

Figure 32.1 also reveals that a number of 
the participants do not have measurements for 
all seven time points. In fact, just in this ran- 
dom sample we see one person with only one 
measurement, and a couple of people with 
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Figure 32.1 Individual growth curves of BMI by age for 40 random cases 


only two measurements. Very close examina- 
tion of the figure also reveals that the period 
between measurements varies across individ- 
uals. That is, although youth are interviewed 
on average once a year in the NLSY97, the 
actual time between interviews can be quite 
a bit shorter or longer for particular individu- 
als. These two common aspects of real world 
longitudinal datasets, varying number of mea- 
surements and varying time between measure- 
ments, pose severe or even fatal challenges for 
traditional longitudinal statistical approaches 
such as repeated measures ANOVA. However, 
multilevel modeling can handle this type of 
“messy” data without any problem. This is, in 


fact, one of the primary reasons that multilevel 
modeling is now a preferred analytic approach 
for growth curve models. 

Unless you have a very specific hypothe- 
sis about the form of the change, a reasonable 
approach to model building is to start simple, 
and then build more complex models. For this 
reason, we start by fitting the above uncondi- 
tional linear growth model to BMI: 


BMI; = Bo; + Ba; (Age12)); + &4; 
Boi = Yoo + Uoi 
Bai = Vio + Ui; 
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In this model we are predicting BMI scores 
for individuals across time. Time is measured 
using the age of the youth at the time of the 
interview after subtracting 12. (We will explain 
the reason for this below.) Both the intercept 
and time slope are allowed to vary across indi- 
viduals. With this model we will be able to 
see how much BMI goes up or down as youths 
age. Given our examination of the raw growth 
curves, we might expect the time slope to be 
positive. 

The results of this model are presented under 
Model 1 in Table 32.2. This table summarizes 
much of the useful information from a growth 
model. The top of the table presents estimates 
of the fixed effects. The coefficients are esti- 
mates of the two gammas. The average inter- 
cept across all individuals is 20.63. This means 
that the expected BMI for any individual when 
the age variable is 0 is approximately 21. This 
helps explain why we subtracted 12 from the 
age at interview variable. If we used the raw age 
at interview, then the estimate of the intercept 
would be interpreted as the expected BMI score 
when a person was age 0 (i.e., a newborn). This, 
of course, is not a useful or interpretable esti- 
mate. By subtracting 12 from each age at inter- 
view score, we get a new interpretation of the 
intercept—the expected BMI value at age 12. 
We picked age 12 because this is approximately 
the youngest age for which there are data in the 
NLSY97 dataset. This is an example of center- 
ing a predictor variable. There has been a lot 
written about centering variables in multilevel 
models (see, e.g., Paccagnella, 2006). Although 
the topic can get quite complicated, the most 
important reason to center predictor variables is 
to produce fixed effects estimates that are more 
interpretable than would otherwise be the case. 

The linear fixed effect estimate of 0.54 tells us 
that for each additional year of age, we would 
expect BMI to increase by about half a point 
for each individual. This suggests that dur- 
ing the teen and young adult years, youth are 
becoming more overweight as they age. Along 


with the coefficient estimates, their associated 
standard errors, t-tests, and p-values are pre- 
sented. Hypothesis testing of the individual 
fixed effects parameters can thus be done using 
traditional methods. 

The middle rows of Table 32.2 present the 
random effects part of the model. These are 
presented in the form of variance components, 
and can be thought of as unmodeled variability. 
The variance component of 16.64 tells us that 
there is a large amount of variability of indi- 
vidual BMI scores around the average starting 
point of 20.63. This confirms what we saw in 
Figure 32.1, where we saw some people with 
quite low BMI scores, and others with high BMI 
scores. The much smaller linear variance com- 
ponent of 0.23 suggests that there is much less 
variability of the slopes across individuals. One 
way to view this is that there is much more 
variability left over to model (with predictor 
variables) of the Jevel of BMI, than there is of 
the slope of BMI on age. Finally, the level 1 
variance component estimate of 3.31 suggests 
that there is a moderate amount of intraindivid- 
ual variability. This suggests that the individual 
observations may be bouncing around the linear 
regression line. This could be due to instability 
of the BMI measurements. Another possibility 
is simply that the simple linear growth model 
is not a good fit with the data. 

In addition to the variance component esti- 
mates, some multilevel modeling software 
packages will produce statistical tests of these 
components. Here we see the chi-square tests 
and associated p-values produced by HLM. 
However, these statistical tests should be 
viewed with caution. First, variance compo- 
nents are bounded at 0, so their distributions are 
not normal. Second, it is not clear exactly what 
the meaning of a significant variance compo- 
nent should be—after all, we generally expect 
variances to be nonzero. Rather than focusing 
on the p-values of the variance components, 
it is usually more fruitful to interpret their 


Table 32.2 Three growth models for change of BMI 


Fixed effects Model 1 — Linear Model 2 — Quadratic Model 3 — Cubic 

Coef. SE t-ratio p Coef. SE t-ratio p Coef. SE t-ratio p 
Intercept (70) 20.63 0.050 413.7 0.000 20.172 0.067 302.7 0.000 19.77 0.088 225.5 0.000 
Linear (79) 0.54 0.006 84.6 0.000 0.722 0.020 36.2 0.000 1.00 0.046 21.4 0.000 
Quadratic (79) —0.016 0.002 -9.7 0.000 —0.07 0.008 —8.2 0.000 
Cubic (739) 0.003 0.000 6.4 0.000 
Random effects Variance x? Pp Variance x p Variance Ne Pp 

component component component 

Intercept (up,) 16.64 39260 0.000 16.54 14437 0.000 16.46 14381 0.000 
Linear (u,;) 0.23 26028 0.000 0.93 10439 0.000 0.91 10384 0.000 
Quadratic (U,;) 0.006 10403 0.000 0.006 10360 0.000 
Cubic (us;) — — — 
Level 1 (e,;) 3.31 3.10 3.10 
Model fit 
Deviance 260895.1 260257.6 260209.0 
Parameters 6 10 11 
AIC 260907.1 260277.7 260231.0 
BIC 260960.6 260366.7 260329.0 


*Cubic effect set to fixed to avoid convergence problems. 
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sizes rather than their significance (Pinheiro 
and Bates, 2000). 


3.3. Assessing nonlinear change patterns 


Model 1 tells us that BMI increases with age, 
but there is still a lot of variability both across 
and within individuals. Figure 32.1 suggested 
that increases in BMI may not be strictly lin- 
ear in form, so the next step is to build models 
that will assess the extent to which the form 
of the change of BMI is nonlinear. There are 
a number of ways of building such curvilin- 
ear models. One of the simpler approaches is 
to build a polynomial growth model by adding 
quadratic, cubic, quartic terms, and so on, to 
the base linear model. For a dataset with k time 
points, in principle k-1 polynomial terms can 
be fit. However, in practice growth models are 
rarely built that go beyond cubic or quartic com- 
ponents. First, in most areas of the social and 
health sciences theories are not rich enough to 
suggest or explain such high-level polynomial 
models. Second, in many real-world datasets 
quadratic or cubic models explain most of the 
intraindividual variability, and it is unusual to 
have underlying variability that requires more 
complicated models. 

To fit polynomial models, you simply add the 
appropriate time variable raised to the degree 
of the polynomial. So, a quadratic model would 
include Time, and Time-squared. A cubic 
model would include Time, Time-squared, and 
Time-cubed, and so on. The following equation 
is for a quadratic polynomial model of change 
of BMI: 


BMI,; = Bo; + Bi; (Age12),; + Bai (Age12);, + Ep 


Boi = Yoo + Uoi 
Pai = Y10 + Uni 
Boi = Yoo + Un; 


Models 2 and 3 listed in Table 32.2 present the 
results of fitting a quadratic and cubic model, 
respectively. For both models, the individual 


coefficients are highly significant, suggesting 
that a curvilinear model is more appropriate 
than a simple linear model. In polynomial mod- 
els, the meaning of the coefficients changes. 
For example, in a quadratic model, the linear 
coefficient for time no longer represents a con- 
stant change rate. Instead, it now represents 
the instantaneous rate of change at the point 
that time = 0. The quadratic coefficient tells us 
how fast the instantaneous rate of change itself 
changes. This can be thought of as a curvature 
parameter (Singer and Willett, 2003). Instead 
of interpreting each coefficient individually in 
a polynomial model, it is often more informa- 
tive to plot the prediction curves for the model 
based on the fitted coefficients. Figure 32.2 
presents the prediction curves for BMI for the 
three models presented in Table 32.2. 

The figure shows that BMI increases steadily 
as youths age. The curvilinear models both sug- 
gest that BMI rises faster at early ages, from 
about 12 to 15. The quadratic model suggests 
that after about the age of 22 the increase in BMI 
starts slowing down. To see how these models 
fit the data at the two age extremes, individ- 
ual marks were added to the plot that repre- 
sent the raw average BMI scores for that age 
group. These marks suggest that the quadratic 
and cubic models fit the data pretty well for the 
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Figure 32.2 BMI prediction curves for linear, 
quadratic, and cubic growth curve models 


Table 32.3 Three growth models for Substance Use Days 


Fixed effects 


Model 1 — Linear 


Model 2 - Quadratic Model 3 — Cubic 


Coef. 


SE t-ratio p Coef. SE _ t-ratio p Coef. SE  t-ratio Pp 
Intercept (7) 0.030 0.161 0.2 0.852 —2.846 0.174 —16.3 0.000 —0.145 0.214 —0.7 0.497 
Linear (710) 1.655 0.030 55.4 0.000 2.777. 0.083 33.44 0.000 0.677. 0.173 3.9 0.000 
Quadratic (729) —0.092 0.007 —13.1 0.000 0.326 0.035 9.3 0.000 
Cubic (739) —0.024 0.002 —11.9 0.000 
Random effects Variance Xe p Variance Xv P Variance Ne P 

component component component 

Intercept (up,) 80.75 15080 0.000 = 27.25 6486 >0.500 27.11 6426 >0.500 
Linear (u,;) 4.26 21205 0.000 26.51 9218 0.000 26.35 9147 0.000 
Quadratic (u,;) 0.162 9579 0.000 0.161 9508 0.000 
Cubic (u,;) — — — 
Level 1 (e,;) 100.24 90.02 89.74 
Model fit 
Deviance 431258.6 428258.7 428116.0 
Parameters 6 10 11 
AIC 431270.6 428278.7 428138.0 
BIC 421324.0 428367.8 428236.0 


®Cubic effect set to fixed to avoid convergence problems. 


woo Aeiquuyel/:sday -APRREIELIPM SUT Jo Yooqpuey PSG 
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early ages, while the quadratic model does a 
better job for young adults. (The means for the 
youngest, age 12, and oldest, age 24.5, groups 
are based on very small numbers of cases, so 
they should be interpreted with caution.) An 
interesting thing to note about these models is 
that from about the ages of 15 to 22, all three 
models would lead to virtually the same pre- 
dicted values. 

Table 32.3 and Figure 32.3 present the results 
for the same set of growth curve models applied 
to the Substance Use Days dependent variable. 
As youths age, we see that the number of sub- 
stance use days also goes up. Again, we find that 
there are significant curvilinear components to 
the change over time. Figure 32.3 shows that a 
linear model does a particularly poor job of pre- 
dicting substance use for the oldest members 
of the sample. Conversely, the quadratic model 
gives impossible predictions for kids aged 12 or 
13. The cubic model may do the best job, and 
it describes a type of S-curve that seems rea- 
sonable for this dependent variable. When kids 
are very young, substance use is near zero and 
changes slowly. During the teen years substance 
use increases significantly, but as adulthood 
approaches, substance use appears to level off. 


3.4 Model diagnostics, fit and selection 


In addition to examining individual parameters 
and their associated p-values, it is usual to exam- 
ine model diagnostics to see how well the fitted 
model matches its underlying assumptions, 
and then to examine various fit indices 
to see how well the overall model fits the data. 
Diagnostics for growth curve models of quan- 
titative dependent variables are very similar 
to those examined for multilevel modeling. 
Two common assumptions that can be eas- 
ily checked are normality of errors (residu- 
als) and homoscedasticity. Figure 32.4 shows 
two diagnostic plots on a random 2% sam- 
ple (~1000 cases) of the quadratic BMI growth 
model (Model 2 from Table 32.2). The Q-Q 
plot on the left side tells us that although the 


residuals are symmetric, they are more kur- 
totic (higher central peak, smaller tails) than 
we would expect with independent and nor- 
mally distributed errors. This suggests that our 
model may require a more complex covariance 
structure than we assumed. (Our model was 
fit assuming a compound symmetry covariance 
matrix. For details on how to fit growth curve 
models with other covariance structures, see 
Singer and Willett, 2003). On the right side we 
plot the residuals against the fitted (predicted) 
BMI values. This plot shows no evidence of a 
fan shape, and strongly suggests that this model 
does not have problems with heteroscedastic- 
ity. Luke (2004) provides more examples of 
how to use graphical exploration of residuals to 
examine the assumptions of multilevel models. 

Growth curve models for quantitative depen- 
dent variables are typically fitted using some 
form of maximum-likelihood estimation (Laird, 
1978). Simply stated, this type of estimation 
works by maximizing a likelihood function that 
assesses the joint probability of simultaneously 
observing all of the sample data, assuming a 
certain set of fixed and random effects. An 
important product of the estimation process is 
a number obtained by multiplying the natu- 
ral log of the likelihood by -—2. This number, 
sometimes called the deviance or designated as 
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Figure 32.3 Substance Use Days prediction curves 
for linear, quadratic, and cubic growth curve models 
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Figure 32.4 Two diagnostic plots for the quadratic BMI growth curve model 


—2LL, is a measure of the discrepancy between 
the observed data and the fitted model. The 
deviance for any one model cannot be inter- 
preted directly, but it can be used to compare 
multiple models to one another. 

The model comparison can be done between 
two models fit to the same data, where one of 
the models is a subset (has fewer parameters) of 
the other. The difference of the deviances from 
each model is distributed as a chi-square statis- 
tic with degrees of freedom equal to the dif- 
ference in the number of parameters estimated 
in each model. For example, we can compare 
Model 3 to Model 1 for BMI (Table 32.2) to 
see if the nonlinear change model is better than 
the simpler linear change model. The difference 
between the two deviances is 886.1 (260895.1 — 
260209.0); this value is highly significant with 
df =5(11-—6). This tells us that the more com- 
plicated model is a significantly better fit to 
the data. 


One disadvantage of the deviance (—2LL) 
is that a model fit to the same data with 
more parameters will always have smaller 
deviance. This is generally good, because 
smaller deviance implies a better fit to the data. 
However, we can always get better fit by adding 
more predictors. We also want to choose the 
simplest model that describes the data; i-e., 
the model with the fewest parameters. Two 
widely used fit indices have been developed 
that are based on the deviance, but incorpo- 
rate penalties for a greater number of param- 
eters: The Akaike Information Criterion (AIC) 
and Schwarz’s Bayesian Information Criterion 
(BIC) (Akaike, 1987; Schwarz, 1978). For both of 
these indexes, smaller is better. Also, an impor- 
tant advantage of these two criteria is that they 
can be used to compare two models fit to the 
same dataset, even if one is not a subset of the 
other. The AIC and BIC are listed in Tables 32.2 
and 32.3 for our change models. In both cases 
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these criteria indicate that the cubic models are 
better models than the simpler linear models. 
For more details on how to calculate and use 
AIC and BIC, see Luke (2004). 


3.5 Different choices for scaling time 


The concept of time is critical for growth curve 
models. Therefore, it is also critical to think 
carefully about how to operationalize, mea- 
sure, and model time in a growth curve model. 
Time can be defined many ways for any par- 
ticular study—and these different conceptions 
may not all equally represent the theory or 
research question under consideration, and dif- 
ferent operationalizations of time may lead to 
different model results. For example, consider 
Table 32.4, which shows four different ways 
that time may be assessed for the NLSY97 
study: the interview wave (from 1 to 7), the 
actual age at the time of the interview, the age 
after subtracting 12, and the grade that the stu- 
dent is in. If the investigator is interested in the 
underlying physiological and cognitive changes 
that influence body weight, then Age or Age12 
might be appropriate conceptions of time. On 
the other hand, if one wants to understand how 


Table 32.4 Examples of different definitions 
of time 


ID Interview wave Age  Age12 Grade 
001 1 13 1 7 
001 2 15 3 8 
001 3 16 4 8 
001 4 16 4 9 
001 5 18 6 11 
001 6 19 7 13 
001 7 20 8 14 
002 1 15 3 10 
002 2 16 4 11 
002 3 16 4 12 
002 4 18 6 13 
002 5 19 7 14 
002 6 20 8 15 
002 7 21 9 15 


the changes in school environment may influ- 
ence drug use, then perhaps Grade would be a 
useful operationalization of time. However, it is 
hard to think of a research question that would 
be usefully served by using Interview Wave as a 
measure of time. This is an arbitrary time mea- 
surement that is based on the logistics of the 
study, not a physical or social reality. 

Table 32.5 presents the fixed effects results 
of a linear BMI growth model for three differ- 
ent definitions of time. The results for Model 3 
on the right side of the table are for the Age12 
variable, and are the same as displayed in 
Table 32.2. Model 1 presents a growth model 
where Interview Wave is used for time. The 
results are similar to Age12, but not identical. 
They are similar in that on average across all 
subjects each interview wave is approximately 
one year apart. Both models show that BMI 
increases about half a point a year. However, 
consideration of the AIC and BIC scores shows 
that the model with Age12 is doing a better 
job of describing the observed data. This is not 
surprising, because Age12 provides information 
not just on the order of the interviews, but 
also reflects an accurate measure of the actual 
amount of time that has passed between each 
interview for each participant. 

The difference between Models 2 and 3 is 
simpler and more subtle. Age is the raw age, 
while Age12 is raw age minus 12. Subtracting 
(or adding) a constant from a predictor vari- 
able is a way of centering the predictor vari- 
able. Centering is typically done in one of three 
ways: (1) by subtracting a meaningful constant, 
as we have done here with Age12; (2) by sub- 
tracting a grand mean; or (3) by subtracting 
a group mean. This third type of centering is 
more complicated than the other two, but is rel- 
atively uncommon for growth models (where 
the group is each individual). In growth mod- 
eling, centering is typically done by subtract- 
ing a constant or grand mean, and this has two 
advantages. First, centering is typically done 
so that the interpretation of intercepts is more 
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Table 32.5 Comparison of three BMI linear growth models with different definitions of time 


Fixed effects Model 1 - Interview wave Model 2 - Age Model 3 - Age12 

Coef. SE t-ratio  p Coef. SE t-ratio  p Coef. SE t-ratio  p 
Intercept (Yq) 21.72 0.047 463.9 0.000 14.12 0.108 131.1 0.000 20.63 0.050 413.7 0.000 
Linear (7) 0.58 0.007 83.0 0.000 0.54 0.006 84.6 0.000 0.54 0.006 84.6 0.000 
Model fit 
Deviance 261356.6 260895.1 260895.1 
Parameters 6 6 6 
AIC 261368.6 260907.1 260907.1 
BIC 261422.0 260960.6 260960.6 


meaningful. Remember that an intercept is the 
predicted value of a dependent variable when 
the predictors are all 0. Consider Models 2 and 
3 in Table 32.5. The intercept for BMI in Model 
2 is 14.12. We interpret this as the predicted 
value of BMI when a person is 0 years old. This 
interpretation is not useful—BMIs for infants 
are not defined or interpretable! If we center 
age by subtracting 12 from each score, the only 
change in the model is that the intercept is now 
20.63. This is the predicted value for a per- 
son who is 12 years old (Age—12 = 0). This is 
much more meaningful, because it represents 
the value for the youngest persons who were 
actually included in the NLSY97 study. If we 
had centered age by subtracting the grand mean 
of age across all of the participants and time 
points, we would have a different intercept. 
This grand-mean centered intercept would be 
interpreted as the expected BMI score for a per- 
son who was the average age of all persons in 
the study. For growth models it is fairly typ- 
ical to center the time variable by subtracting 
the time at the first observation. This allows an 
interpretation of the intercept as the “starting 
point” of the growth curve. 

The second reason that centering is typically 
done in growth models has to do with problems 
of multicollinearity. If polynomial transforma- 


tions of time are used to build nonlinear growth 
models, the various time predictors (time, time- 
squared, etc.) are highly intercorrelated, and 
may lead to convergence problems, especially 
with smaller datasets. By centering all of the 
time variables the intercorrelations are reduced, 
and convergence problems will be less likely. 


3.6 Identifying predictors of change 


The above presentation has focused on devel- 
oping growth curve models whose primary goal 
is to describe the form or shape of change. Typ- 
ically, however, researchers are also interested 
in developing models and testing hypotheses 
that include predictors of change. In addition to 
the level 1 Time variable, growth curve models 
can include other types of level 1 (intraindivid- 
ual) and level 2 (interindividual) covariates or 
predictors. 


3.7 Predictors of interindividual change 


In growth curve models, covariates that are con- 
stant over time, such as gender or experimental 
condition, are known as interindividual predic- 
tors. These predictors can tell us how change 
varies across different types of individuals. For 
our example, we will examine the effects of gen- 
der and ethnicity on the change in Substance 
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Use Days. To start we examine whether gen- 
der (Male) and ethnicity (NonWhite) affect the 
intercept of Substance Use Days (SUD). Both of 
these predictors are binary, with 1 indicating 
male or nonwhite, respectively. (The original 
NLSY97 ethnicity variable was categorical with 
several ethnicity options. This was recoded to 
0=white, 1=nonwhite for this example.) The 
following equation shows the growth model to 
be fitted. 


SUD); = Bo; + Bi; (Aget2),: + &4 
Boi = Yoo + Yo: (Male); + Yo. (NonWhite); + Up; 
Bui = Vio + Uy 

This growth model makes it clear that Male 


and NonWhite are entered as level 2 predic- 
tors for the intercept (85,) of Substance Use 


Days. Note that this means for this first model 
that we assume a single linear slope for SUD 
on Age. 

The results of fitting this first predictor model 
are shown in the left-hand side of Table 32.6. 
Both the gender and ethnicity predictors are 
highly significant. The intercept (1.39) is now 
interpreted as the predicted number of Sub- 
stance Use Days for a white female age 12. 
The gender effect (1.17) tells us that 12-year-old 
males use substances about one day a month 
more often, and the ethnicity effect (—4.15) tells 
us that nonwhites are much less likely to use 
substances when they are young. 

However, it may be that the relationship 
between age and substance use may not be 
the same across the different gender and eth- 
nic groups. To test this, we can fit a more 
complex model that allows the linear slope of 


Table 32.6 Effects of gender and ethnicity on change in Substance Use Days 


Fixed effects 


Model 1 — Intercept effects 


Model 2 - Slope and intercept effects 


Coef. SE t-ratio  p Coef. SE  t-ratio Pp 
Intercept (yoo) 1.392 0.242 5.7 0.000 1.708 0.290 5.9 0.000 
Male (yo) 1.170 0.228 5.1 0.000 —1.690 0.321 —5.3 0.000 
NonWhite (2) —4.150 0.226 —18.4 0.000 —1.722 0.319 —5.4 0.000 
Age12 slope (y49) 1.661 0.030 55.6 0.000 1.582 0.050 31.8 0.000 
Male (y,;) 0.738 0.058 12.6 0.000 
NonWhite (y,,) —0.624 0.059 -—10.6 0.000 
Random effects Variance x? Pp Variance eo 2 
component component 
Intercept (up,) 84.62 15280 0.000 79.53 15000 0.000 
Linear Slope (u,;) 4.27 21241 0.000 4.03 20699 0.000 
Level 1 (¢,;) 100.07 100.24 
Model fit 
Deviance 430916.6 430648.0 
Parameters 8 10 
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SUD on Age12 to vary by gender and ethnicity. 
This corresponds to the following growth 
model: 


SUD); = Boi + By; (Agel2) 5 + &1 
Boi = Yoo + Yor (Male) ; + Yo2 (NonWhite) ; + Up; 
Bai = Y10 + 11 (Male) ; + ¥:2 (NonWhite); + u,; 


Here we can see that gender and ethnicity are 
allowed to affect not only the intercept of SUD, 
but also the slope of SUD on Age. This model 
can be re-expressed in the mixed effects for- 
mat as: 


SUD; = Yoo + Yo: (Male); + Yo. (NW); 
+ ¥19 (Age); + ¥11 (Male) ; (Age); 
+ Vi2 (NW), (Age); + Up + Uy + 4; 


Although somewhat complicated, the mixed 
effects version highlights the fact that by includ- 
ing level 2 predictors of the slope we are actu- 
ally entering cross-level interactions into the 
model. For example, y,, will assess the extent to 
which the slope of SUD on age varies between 
girls and boys. This is, in effect, an interac- 
tion between gender (level 2) and time (level 1). 
Cross-level interactions are often the effects of 
most interest to researchers. For example, in 
longitudinal clinical trials, the test of the effec- 
tiveness of an intervention is typically mod- 
eled as a cross-level interaction of experimental 
condition (e.g., experimental vs control groups) 
by time. 

The results of this model can be seen in the 
right side of Table 32.6. The parameter esti- 
mates show that gender and ethnicity are strong 
predictors of both the intercept and slope of 
time. To interpret these effects, it again helps to 
plot the prediction equations (Figure 32.5). Here 
we see the same basic finding that substance 
use increases over time. However, this model 
also shows us that we expect this increase to be 
the greatest for white males, and the lowest for 
nonwhite females. 
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Figure 32.5 Predicted effects of gender and 
ethnicity on linear change of total Substance Use 
Days 


3.8 Predictors of intraindividual change 


There are many times that the predictors 
of interest in growth curve modeling will 
change over time themselves. These time- 
varying covariates can be used to model intra- 
individual change. Consider, for example, the 
effects of school transitions on daily substance 
use among youth. Social scientists have viewed 
school transitions as times of higher stress for 
students, as well as opportunities to form new 
social networks, have more freedom from pre- 
vious family and peer expectations, and oth- 
erwise provide a changed social environment 
for substance use. It would be reasonable to 
assume that substance use may look different 
after a school transition. How could this be 
modeled? 

First, consider Table 32.7, which shows an 
example data file that could be used in growth 
curve modeling. Male is a variable denoting 
gender that is a constant covariate—it does not 
change over time during the study. College tran- 
sition, on the other hand, is an indicator vari- 
able that is 1 when an interviewee is attending 
a new college during the time of the interview. 
This is a time-varying covariate—it can take on 
different values (although only two values for a 
binary variable) at different time points. 
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Table 32.7 Example data file with constant and 
time-varying covariates 


ID Interview Age Substance Male College 


wave Use Days transition 
001 1 13 0 i 0 
001 2 15 3 1 0 
001 3 16 8 1 0 
001 4 16 4 1 0 
001 5 18 10 1 1 
001 6 19 12 1 0 
001 7 20 10 1 0 
002 1 15 2 0) 0 
002 2 16 5 0 0 
002 3 16 8 0 0 
002 4 18 15 0 1 
002 5 19 14 0 0 
002 6 20 12 0 1 
002 7 21 12 0 0 


Using the NLSY97 data, we can examine the 
effects of college transition on substance use 
over time using the following model: 

SUD}; = Boi + Bi; (College); + Bx; (Age12),; 

+ B;; (College) (Age12),; + &j 


Boi = Yoo + Uoi 
Bui = V0 
Boi = Yoo + Usi 
Bs; = Y30 


The college transition variable is entered into 
the level 1 part of the growth model, because 
it can take on different values at different time 
points (as suggested by the tf subscript). College 
appears twice in the level 1 part of the model. 
The college main effect (8,;) will assess the 
effects of college transition on the intercept of 
Substance Use Days. That is, it will allow us to 
see how much substance use shifts up or down 
during a year when there is a college transition. 
The college by age interaction term (8,;), on the 
other hand, allows us to see if there is a change 


in the slope of substance use over time after 
a college transition. This model has no level 
2 predictors, and note that only the intercept 
and Age12 are modeled as random effects. We 
assume for this example that the effects of col- 
lege transitions are the same for all individuals. 

The results of fitting this model are presented 
in Table 32.8. Similar to our previous mod- 
els, we see that youth start out at age 12 using 
substances approximately 0 days, and that this 
increases by about 1.6 days of use per year. 
A transition to college is associated with an 
upward shift of 4.2 substance use days. How- 
ever, after a college transition, the upward trend 
over time has been reduced by .47 days per 
year. This can be seen more clearly in the pre- 
diction graph in Figure 32.6. In this graph we 
examine the predicted growth curve of sub- 
stance use days for a person who enters college 
at age 18. The vertical dashed line represents 


Table 32.8 Effects of college transitions on change 
in Substance Use Days 


Fixed effects Model 1 - Intercept effects 
Coef. SE t-ratio p 

Intercept (Yq) —0.007 0.161 —0.0 0.967 

College (y19) 4.160 0.923 4.5 0.000 

Age12 (Y9) 1.649 0.030 54.8 0.000 

College X —0.471 0.126 -—3.7 0.000 

Age12(Y39) 
Random effects Variance Ke Pp 
component 
Intercept (Up;) 80.64 15083 0.000 


Linear Slope (u,;) 4.26 21222 0.000 
Level 1 (é,;) 100.12 

Model fit 

Deviance 431225.1 

Parameters 8 
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Figure 32.6 Predicted effects of college transition 
(at age 18) on linear change of Substance Use Days 


the transition to college at age 18. At that point 
we see the sudden shift of substance use days 
upward to about 14 days. This may reflect the 
greater access to alcohol that many college stu- 
dents experience. After the transition to college, 
substance use still increases, but at a slower rate 
than for youth who do not make a transition 
to college (represented by the dashed line that 
continues upward to the right). 


4 Conclusion 


As we have seen, multilevel growth curve mod- 
eling is a flexible tool for analyzing longitu- 
dinal data. By viewing longitudinal data as 
observations nested within individual cases, we 
can use the power of multilevel modeling to 
answer questions about patterns and predic- 
tors of change. A number of advanced or more 
technical topics have been passed over or only 
mentioned briefly in this chapter. In particular, 
this chapter has focused on the use of growth 
curve modeling for quantitative dependent vari- 
ables. Growth curve models can be built for 
other types of dependent variables, including 
binary, count, and ordinal variables. For more 
detailed treatment of these generalized multi- 
level models, see the relevant sections in Hox 
(2002) and Snijders and Bosker (1999). Also, the 


next chapter in this volume deals with multi- 
level change models for categorical dependent 
variables. 


Software 


Users have a large number of good choices for 
software for fitting growth curve models. Any 
statistics package that includes mixed effects 
modeling or multilevel modeling can be used 
to develop growth curve models of the type dis- 
cussed in this chapter. Table 32.9 lists the major 
software packages that are widely known and 
are powerful enough to develop a wide vari- 
ety of growth curve models. Users can choose 
to use specialized software that focuses pri- 
marily on multilevel modeling (i.e., HLM or 
MLwiN), or general-purpose statistical software 
that includes mixed effects modeling proce- 
dures (i.e., R/S-Plus, SAS, SPSS, or Stata). Users 
new to growth curve models may want to learn 
these procedures using the specialized soft- 
ware. The interface and documentation of these 
packages make for a shallower learning curve 
for growth curve modeling. More experienced 
users may wish to use the general-purpose 
software. In particular, the data management 
and graphical exploration features of R, SPSS, 
SAS, and Stata cannot be matched by HLM or 
MLwiN. 

The Centre for Multilevel Modelling (sic) 
maintains a comprehensive list of reviews 
of software packages for multilevel modeling 
at: http://www.mlwin.com/softrev/index.html. 
All of the packages listed in Table 32.9 
are reviewed at this site, but some of the 
reviews are out of date. An extremely use- 
ful site for learning about multilevel soft- 
ware is UCLA’s statistical computing portal 
at: http://www.ats.ucla.edu/stat/. For example, 
all of the data and examples from Singer and 
Willett’s textbook on longitudinal data analy- 
sis are presented for each of the six software 
packages listed in Table 32.9. See http://www. 
ats.ucla.edu/stat/examples/alda.htm. 
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Table 32.9 Information about growth curve modeling software 


Specialized multilevel modeling software 


Version Interface Information Core references 
HLM 6.02 Graphical http://www.ssicentral.com Raudenbush, et al. (2000) 
MLwiN 2.01 Graphical http://www.mlwin.com Rasbash, et al. (2000) 

General statistics software 

Version Interface Information Core references 
R/S-Plus — nlme R: 2.3.0; Syntax http://www.r-project.org/ Pinheiro and 
or Ime4 S-Plus: 7 http://www. insightful.com Bates (2000) 
SAS — Proc 9.1.3 Syntax http://www.sas.com Singer (1998) 
MIXED 
SPSS — MIXED 14 Either http://www.spss.com SPSS Advanced 

Models documentation 

Stata — gllamm 9 Syntax http://www.stata.com Rabe-Hesketh and 


and xtmixed 


http://www.gllamm.org 


Skrondal (2005) 


Glossary 


AIC Akaike Information Criteria—a_parsi- 
mony corrected measure of model fit. 


BIC Bayesian Information Criteria—a_parsi- 
mony corrected measure of model fit. 


Centering A reparameterization of a predictor 
by subtracting a grand mean, group mean, or 
constant. 


Deviance ‘This is —2 times the log-likelihood 
of an estimated model. 


Fixed effect This corresponds to the constant 
effects across persons of a predictor variable in 
a growth curve model. 


Growth curve model A mixed effects model 
applied to longitudinal data. 


Maximum likelihood estimation The most 
common type of estimation technique used for 
growth curve models of quantitative dependent 
variables. 


Mixed effects model A statistical model incor- 
porating both fixed and random effects, useful 
for analyzing grouped and longitudinal data. 


Random effect This corresponds to the vari- 
ance components in a growth curve model. 
Parameters (slopes and intercepts) that are 
allowed to vary across persons are random 


effects. 
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| Chapter 33 | 


Multilevel analysis with 
categorical outcomes 
Scott Menard 


In the analysis of intraindividual change using 
quantitative dependent variables, it is common 
to apply some form of growth curve analysis, 
either the latent growth curve model described 
by Stoolmiller in Chapter 31, or the multi- 
level growth curve model described by Luke in 
Chapter 32, both in this volume. With a cate- 
gorical dependent variable, however, even the 
definition of “change” is not entirely obvious 
(as illustrated in Chapter 30 on panel anal- 
ysis with logistic regression), and “growth” 
only makes sense for variables measured at the 
ratio, interval, or at least ordinal level. For a 
dichotomous dependent variable, one cannot 
go “up” or “down” on a continuum; one can 
only change from one value to the other (and 
back again). For a polytomous (multiple cate- 
gories) nominal dependent variable, there is no 
inherent meaning to the “direction” of change. 
Thus, it is more appropriate to speak of qual- 
itative “change” rather than implicitly quanti- 
tative “growth” when dealing with categorical 
dependent variables. In modeling longitudinal 
patterns of change, moreover, the latent growth 
curve modeling approach has not been very 
extensively or well adapted to the analysis 
of categorical outcomes. Instead, in terms of 
both conceptual development and even more 
in terms of readily available software, mod- 
els for categorical developmental change are 


most readily implemented in the multilevel 
modeling approach, with some of the simpler 
models being possible using population aver- 
aged and random effects models discussed in 
other chapters (Hilbe and Hardin, Chapter 28; 
Greenberg, Chapter 17; Finkel, Chapter 29; 
Menard, Chapter 30). 

In this chapter, considerations pertinent to 
the use of multilevel growth curve models for 
categorical dependent variables are introduced 
in the context of analyzing a multilevel change 
model of marijuana use. We begin with issues in 
identifying the relationship between the depen- 
dent variable and the time dimension used in 
the model, with particular attention to the use 
of orthogonal polynomial contrasts. We then 
examine the general structure of the multilevel 
model, and its specific application to categor- 
ical dependent variables. Following a descrip- 
tion of the data to be used in the examples, we 
then calculate population averaged (marginal) 
and unit-specific (or subject-specific, or con- 
ditional) models for the prevalence of mari- 
juana use, including random as well as fixed 
effects in the models. (It may be useful here 
to review the material on marginal and condi- 
tional models with categorical dependent vari- 
ables in Chapter 30.) The focus here is on the 
multiwave longitudinal multilevel model with 
a dichotomous outcome. The chapter ends with 
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a brief discussion of extensions of the models 
presented here to include additional interac- 
tion effects, and to the inclusion of polytomous 
dependent variables. 


1 Specifying the relationship 
between the categorical dependent 
variable and time 


For any model involving the pattern or timing 
of change in a categorical dependent variable, 
a useful step in constructing logistic regres- 
sion change models consists of screening the 
data to determine the zero-order relationship 
between the dependent variable and time. For 
a dichotomous dependent variable, it may be 
informative to plot the mean of the dependent 
variable (the mean probability of a response of 
1 as opposed to a response of 0) with respect to 
the time dimension (age or chronological time). 
Polytomous nominal dependent variables are 
problematic in this respect, because there is no 
“increase” or “decrease” in the dependent vari- 
able, only change. Here, for manageably short 
series, a contingency table of the dependent 
variable with time may be useful for explor- 
ing the data, but for longer series (e.g., more 
than 20 or so cases) this becomes unwieldy. 
A polytomous ordinal dependent variable can, 
for this purpose, be treated as an interval vari- 
able, and the average (mean or median) rank 
can be plotted against age or chronological time. 
Such plots, however, may not by themselves 
be adequate, particularly when the focus is on 
change within cases. One approach to exam- 
ining patterns of intraindividual change is to 
plot the values of the dependent variable along 
the time dimension for each individual sep- 
arately, possibly overlaying the plots. For a 
small number of cases, this may produce dis- 
tinct and distinguishable patterns, but with over 
1000 cases, the plot is likely to be an indistinct 
blob, possibly obscuring even outlying cases. 
(For a more extended discussion of issues in 


graphing intraindividual change across time, 
see Fitzmaurice, Chapter 13, in this volume). 
With large numbers of cases and a focus on 
change within cases, therefore, initial screen- 
ing may best be done using a model with a 
polynomial function of time as a predictor. The 
degree of the polynomial (the highest power 
included in the function) needs to be at least 
one less than the number of time points (period 
or age) against which the dependent variable is 
plotted, and as a practical matter should prob- 
ably be about three less than the number of 
time points. The model would then be logit(Y) 
= InfY/(1- Y)] =+f,t+f,t? +... +By_st™ 
where t is the time variable (age or period), and 
there are a total oft =1,2,..., T distinct periods 
for which the model is being estimated. This 
procedure, although especially informative for 
explaining intracase change for a large number 
of cases, can be informative for deciding the 
function of time to be used, at least initially, 
in the full model. For dichotomous dependent 
variables, estimating a bivariate model with age 
or time as the only predictor can be estimated 
using an orthogonal polynomial contrast. (For 
a general discussion of the use of different 
contrasts for categorical predictors in logistic 
regression analysis, see Menard, 2002.) For a 
polytomous dependent variable, the same pro- 
cedure can be followed by first selecting an 
appropriate contrast, then using that contrast to 
dichotomize the dependent variable into c—1 
dichotomous variables, where c is the number 
of categories in the dependent variable. The 
c—1 dichotomous variables will correspond to 
the c—1 logistic functions to be modeled. For 
a nominal polytomous dependent variable, an 
indicator contrast corresponding to the baseline 
category logit model would be most appropri- 
ate; for an ordinal polytomous dependent vari- 
able, a contrast corresponding to the type of 
ordinal logistic regression (e.g., cumulative or 
continuation ratio logit) should be used. Once 
again, an orthogonal polynomial contrast for the 
predictor (age or time) should be used, but there 
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will be c—1 separate logistic regression equa- 
tions to estimate. 

For age (A) or time (T) in their original met- 
rics, the different powers of the polynomial will 
be highly correlated, e.g., A with A? and A%, 
producing collinearity among the powers of A 
and making it difficult or impossible to separate 
linear (A) from quadratic (A?) and cubic (A?) 
effects. Centering A by subtracting the mean 
eliminates the correlation between the linear 
and quadratic effects: (A—,) should be uncor- 
related with (A —p,)?. For higher powers, how- 
ever, centering is generally not sufficient, and 
more complex methods of calculating orthogo- 
nal polynomials may be required. An orthog- 
onal polynomial contrast can be automatically 
implemented in some existing software (e.g., 
SAS or SPSS). The equation being modeled can 
be expressed as logit(Y) = a+ B,f,(t) + B.f, (t?) 
+... +8,_,f,,(t° 1) where t is again the time 
dimension (age or chronological time) and f(t) is 
a transformation designed to make the contrast 
involving time orthogonal. 

By examining the statistical significance of 
the categorical time dimension variable as a 
whole, it is possible to see whether age or 
time has any statistically significant bivariate 
influence on the dependent variable, but given 
the possibility of a suppressor effect, statistical 
nonsignificance of the time dimension in the 
bivariate analysis is not a sufficient basis for 
eliminating the time dimension from the full 
model. A slightly different approach may be 
followed, however, in examining the statistical 
significance of the specific powers of the time 
dimension. As a general observation, it is rare in 
both the social and the physical sciences to find 
a relationship that requires higher than a cubic 
power between a predictor and an outcome. For 
powers higher than 4, there is a danger that 
the model may be overfitted, and that the addi- 
tional variation being explained by the higher 
powers of the polynomial may be random rather 
than systematic variation in the dependent vari- 
able. For this reason, it is generally reasonable 


to eliminate all powers higher than the last sta- 
tistically significant power of age or time from 
the model for further estimation. 

If it is the case that there are statistically non- 
significant coefficients for powers of t lower 
than the highest power of t for which there is 
a statistically significant coefficient, there are 
three options: (1) stepwise elimination of sta- 
tistically nonsignificant powers of t using back- 
ward elimination, regardless of whether the 
nonsignificant powers of t thus eliminated are 
lower than the highest power of t for which a 
statistically significant coefficient is obtained; 
(2) hierarchical elimination, in which all pow- 
ers of t lower than the highest power of t for 
which there is a statistically significant coef- 
ficient are retained; or (3) forward inclusion, 
stopping when the next power of t is statisti- 
cally nonsignificant. The danger of using option 
(3) is the same as the danger of using forward 
stepwise inclusion generally, misspecification 
by the omission of effects that would be discov- 
ered as statistically significant in a full model 
or a reduced model using backward elimina- 
tion. Option (2) is probably the safest option 
in the sense of avoiding misspecification, but it 
runs the risk of inefficiency by the inclusion of 
unnecessary parameters in the model. Option 
(1) seems like a reasonable balance between 
the two. In general, hierarchical elimination 
(option 2) seems to be the preferred option 
in practice, but there is generally little reason 
to expect the function of time expressed as a 
higher order polynomial to be hierarchical in 
nature. Bear in mind, too, that the polynomial 
form of the function of time may be only an 
approximation to another function of time, not 
a representation of the true function of time, 
and a higher order polynomial may reflect this. 

Further insight on this may be attained when 
other predictors are added into the model, if 
coefficients for higher order functions of t are 
no longer statistically significant in the pres- 
ence of statistical control for other predictors. 
The initial results regarding the powers of t that 
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are statistically significant, however, do give us 
a starting point for estimating the model with 
a particular function of time. It may not be the 
case, however, that higher powers of time are 
necessary in the full model. Instead, the appar- 
ent influence of the higher powers of time may 
instead reflect the interaction between the time 
dimension and one or more time-varying covari- 
ates, predictors whose values, like that of the 
dependent variable, can change over time. This 
can be tested by comparing models (1) hav- 
ing both the higher order powers of the time 
dimension plus the time-covariate interactions 
in the model, with models having (2) the time- 
covariate interactions but not the higher pow- 
ers of the time dimension, and (3) the higher 
order powers of the time dimension but not the 
time-covariate interactions. This may be rele- 
vant in models of either historical or develop- 
mental change, but seems to be more typical of 
developmental change. 


2 Multilevel logistic regression 
models for repeated measures data 


Multilevel change models of intraindividual (or 
more generally intracase) change are discussed 
in Bijleveld et al. (1998, Ch. 5), Raudenbush 
and Bryk (2002, Ch. 6), and Snijders and Bosker 
(1999, Ch. 12). Raudenbush et al. (2000) and 
Snijders and Bosker (1999) include chapters on 
analysis of categorical (dichotomous, nominal, 
ordinal, and count) dependent variables. The 
basic model for multilevel analysis of longitu- 
dinal data involves two levels, the individual or 
case level (level 2), with data that describe char- 
acteristics of the case that do not vary over time, 
and the observation level (level 1), with data 
on repeated measurements of time-varying indi- 
vidual characteristics, including the dependent 
variable. A simple descriptive change model 
would include no level 2 predictors, and only a 
measure of time or age (or both) as a predictor in 
the level 1 model. In this case the effect of time 
on the dependent variable is said to be fixed (as 


opposed to random, i.e., variable). More com- 
plex models could include more complex func- 
tions of time (e.g., quadratic or cubic polynomi- 
als) and additional time-invariant covariates at 
level 2 plus time-varying covariates at level 1. 
The level 1 equation in this context repre- 
sents the repeated observations nested within 
individuals or cases, and has the form 


Nj — logit (Y,;) — Bo; + Bij Xa + Boj Xe; + Spans 


+ Bij Xi +25 (1) 
where the subscript t=1, 2,..., T refers to the 
measurement times, k=1,2,..., K refers to the 


predictors X,, X,,..., Xx, and J=1, 2,...,J 
refers to the specific cases (typically individu- 
als) for which the parameters Bp, B,, ..., By are 
calculated; and Bo; is the intercept (instead of a) 
to simplify the multilevel notation. The depen- 
dent variable 1 is a function of Y in a general- 
ized linear model, and our specific concern is 
with y, = logit(Y,,) or a parallel transformation 
for the nominal and ordinal polytomous logis- 
tic regression models. The term r, represents 
a random effect (essentially random error) at 
level 1. The predictors X,, X;,..., Xx are time- 
varying covariates, predictors which like the 
dependent variable represent repeated observa- 
tions or measurements nested within the cases. 
The level 2 predictors W,, W,,..., Wo are time- 
constant covariates, stable characteristics of the 
cases on which the repeated measurements (of 
at least the dependent variable) are taken over 
time. At level 2 (the case or, most often, the 
individual level), we can model the level 1 coef- 
ficients as a function of an intercept (an indi- 
vidual or case mean value) and the g = 1, 2,..., 
Q time-constant covariates: 


Big = Yeo + Yaa Wi + Vo Win + °° 
+ vQWig + Uy (2) 


where u,, is the level 2 random effect. Combin- 
ing the level 1 and level 2 models, 


Ny = (Yoo + Ug;) + [ex %q(Vieq Wig + Uys) Xie; ale Ty (3) 
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where the respective level 1 B coefficients have 
been replaced by their level 2 equations. Pos- 
sible models include pure fixed effect mod- 
els (all u,; = 0 by definition), random intercept 
models with fixed coefficients (ug; 40 but all 
other u, = 0 by definition), and more broadly 
random coefficient (random slope plus random 
intercept) models (random components u,; are 
included in at least some of the 8 coefficients). 
If the only random coefficient is the random 
intercept, it is possible to estimate the model 
using something other than multilevel software, 
e.g., Stata xtlogit for a dichotomous dependent 
variable. More complex models with additional 
random coefficients can be estimated using ded- 
icated multilevel modeling software that per- 
mits the use of categorical dependent variables, 
such as HLM. 

Unlike multilevel growth curve models for 
interval/ratio dependent variables, the use of 
multilevel change modeling for categorical 
dependent variables will typically involve less 
of a concern with variance components, and 
particularly with the allocation of variance in 
the dependent variable between the level 1 
and level 2 components of the model, than 
is the usual practice using multilevel model- 
ing. Instead, the focus will generally be on 
(a) how well we can explain the variation in 
the dependent variable, and (b) the statistical 
and substantive significance of different predic- 
tors as predictive or explanatory variables. In 
logistic regression analysis more generally, the 
focus is most appropriately on explained varia- 
tion, as measured by the likelihood ratio coeffi- 
cient of determination R,,” (Menard, 2000); but 
in the context of multilevel modeling, estima- 
tion often involves approaches other than max- 
imum likelihood (see, e.g., Raudenbush and 
Bryk, 2002) and hence the appropriate max- 
imum likelihood statistics for calculating R,,’ 
are typically not available. In this case, the 
squared correlation between the observed and 
predicted values of the dependent variable, Ky’, 
may be the best we can do for a measure of 


explained variation. For individual predictors, 
particularly when different predictors are mea- 
sured on different scales, and especially for pre- 
dictors measured on arbitrary metrics, the fully 
standardized logistic regression coefficient b* = 
(b)(sx)(R)/Sjogitc?) aS defined in Menard (2004) 
can be used to indicate substantive significance. 
In this formula, b* is the standardized coeffi- 
cient, b is the unstandardized coefficient, sx is 
the standard deviation of the predictor, R is the 
correlation between the observed (zero or one) 
and predicted (probability ranging from zero to 
one) values of the outcome, and §,,,;;y) is the 
standard deviation of the predicted values of Y. 


3 Multilevel logistic regression for 
prevalence of marijuana use 


To illustrate the application of the unit-specific 
and population averaged multilevel models in 
logistic regression for a dichotomous dependent 
variable, we turn once again (as in Chapter 30) 
to the prevalence of marijuana use as our depen- 
dent variable, with exposure, belief, gender, 
and ethnicity as predictors, using data from 
the National Youth Survey (NYS), a multiwave 
longitudinal study of a self-weighting national 
household probability sample of 1725 individ- 
uals who were 11-17 years old when first inter- 
viewed in 1977, and who were last interviewed 
in 2002. As described previously in Chapter 30, 
the dependent variable is marijuana use, or 
more specifically change in the prevalence (yes 
or no) of marijuana use. The predictors are 
(1) exposure to delinquent friends, a scale indi- 
cating how many of one’s friends have engaged 
in nine different types of delinquent behav- 
ior ranging from assault to theft to illicit and 
underage substance use and drug sales, plus 
whether they have encouraged the respondent 
to do anything against the law; (2) belief that it 
is wrong to violate the law, a scale indicating 
how wrong the respondent thinks it is to engage 
in any of nine types of behavior (the same as the 
first nine items in the exposure to delinquency 
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scale), (3) age, measured as years since birth, (4) 
gender, coded 0=female, 1=male, and (5) eth- 
nicity, with white non-Hispanic respondents as 
the reference category and two other categories, 
African-American and Other. 

Exposure and belief are both time-varying 
covariates on which we have repeated measures 
across the different ages and periods, and thus 
will be included in the level 1 model. Gen- 
der and ethnicity are time-invariant covariates, 
unchanging characteristics of the individuals 
within which the observations are nested, and 
will be included in the level 2 model. Age varies 
over time, but does so identically for every case 
(everyone ages at the same rate). Theoretically, 
marijuana use should be positively associated 
with exposure to delinquent friends and nega- 
tively associated with belief that it is wrong to 
violate the law. Age, ethnicity, and gender are 
included as demographic controls, but associ- 
ations of age, gender and ethnicity with mar- 
ijuana use have been found in past research. 
For a more detailed description of the sam- 
ple, the variables, and the theoretical basis for 
the models tested here, see Elliott et al. (1989). 
Here, since we want to model the relationship 
between marijuana use and age, we begin by 
examining the mean age-specific prevalence of 
marijuana use plus the relationship between 
prevalence of marijuana use and age with an 
orthogonal polynomial contrast for age. 

Figure 33.1 shows the pattern of the mean 
prevalence of marijuana use with age. (A plot 
of the mean of the logit of marijuana use with 
age has a similar shape but different numerical 
values.) It appears from the plot that marijuana 
use peaks around ages 17-19, then declines 
(a little irregularly) up to age 33, the oldest 
age for which data are available in this NYS 
dataset. The pattern suggests that there is at 
least a quadratic relationship involving (Age)’, 
and possibly higher powers of age. Analysis 
of the relationship between marijuana use and 
the orthogonal polynomial contrast of age indi- 
cated that the linear coefficient of Age, or the 
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Figure 33.1 Mean prevalence of marijuana use 
by age 


coefficient of (Age)', was not statistically sig- 
nificant, but the coefficients for the second, 
third, and fourth powers of Age were statisti- 
cally significant as predictors of marijuana use. 
These results were used to construct the pre- 
dictors Agep1, Agep2, Agep3, and Agep4 (Age 
to the power 1, Age to the power 2, ..., Age 
to the power 4) for inclusion in the analysis 
of marijuana use over the portion of the life 
span included in the NYS data. A word of cau- 
tion: the fourth order polynomial function of 
age appears to be appropriate for the age range 
included in the sample, but it would generally 
be inappropriate to try to generalize the results 
beyond the ages included in the sample. In par- 
ticular, projecting the fourth order polynomial 
function of age below age 11 or above age 33 to 
predict prevalence of marijuana use at earlier 
or later ages would be inappropriate, and such 
projections, taken far enough beyond the age 
ranges for which they were estimated, typically 
produce nonsensical results. 

We also need to contend with a feature of 
the National Youth Survey (NYS) data that may 
occur in other datasets as well: unequal mea- 
surement intervals. For the data used here, mea- 
surements were taken for the years 1976, 1977, 
1978, 1979, 1980, 1983, 1986, 1989, and 1992. 
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Also, marijuana use and exposure are measured 
for the interval spanning the entire year for 
which measurement occurs, but belief is mea- 
sured for a point in time at the end of that 
measurement period. If we lag both exposure 
and belief as predictors of marijuana use (as 
one might in a two or three wave panel model), 
we have a lag of three years between belief 
and exposure as predictors and marijuana use 
as an outcome. Possible solutions to this prob- 
lem include (1) ignoring the length of the mea- 
surement period, risking misspecification and 
underestimation of the impact of exposure and 
belief in the later waves of the model; (2) ignor- 
ing the incorrect temporal order of belief rel- 
ative to exposure and marijuana use, risking 
overestimation of the relationship of belief to 
marijuana use and calling into question the 
causal direction of the relationship; or (3) using 
a one-year lag for each wave, and where neces- 
sary imputing the values of the lagged predic- 
tors, belief and possibly exposure as well. There 
are several possible implementations of this last 
approach. The simplest would be linear inter- 
polation, calculating the change per year and 
then subtracting that amount from the values for 
exposure and belief as measured for 1983, 1986, 
1989, and 1992. A more complex alternative 
would be to use information about the distribu- 
tion of exposure and belief over time, plus infor- 
mation about all of the waves of data (instead 
of just using the information for the current 
and previous wave) to estimate the missing val- 
ues. Rather than focus on the issues involved in 
missing value imputation, linear interpolation 
is used here to produce estimates of the lagged 
values of the predictors exposure and belief. 


4 The population averaged model 
for prevalence of marijuana use 


In a population averaged model, the concern is 
with rates or averages in the population, with 
how much an increase in average exposure or 
average belief would have on the average preva- 
lence of marijuana use in the population. This 


model would typically be more appropriate for 
(1) historical change, when all of the cases 
in the population experienced the same his- 
torical influences, and (2) broadly-based pop- 
ulation level as opposed to individual-level 
interventions, when all of the cases received 
the same intervention. The population aver- 
aged or marginal model includes within-case 
interdependence of the observations by averag- 
ing effects across all cases. Table 33.1 presents 
the results of calculating a population averaged 
model for the prevalence of marijuana use with 
gender, ethnicity, exposure, and belief (both 
lagged to be measured temporally prior to the 
measurement of marijuana use), and the orthog- 
onal fourth power polynomial function of age. 
Not shown here are tests for collinearity, which 
indicated that collinearity among the predic- 
tors was not a problem. The R,’ statistic for the 
model indicates a moderate level of explained 
variance (R,” = .28, p = .000). Also included in 
this row is the number of degrees of freedom 
for level 2 (individual respondents) and level 1 
(observations nested within respondents), used 
in calculating the statistical significance of the 
Student’s t statistics (not to be confused with 
the variable t for the time dimension) for the 
coefficients. 

The dependent variable (logit marijuana use) 
is listed along with the predictors in the first 
column in Table 33.1. The second column 
shows the standardized logistic regression coef- 
ficient (b*), the third presents the unstandard- 
ized logistic regression coefficient (b), and the 
fourth contains the standard error of the unstan- 
dardized logistic regression coefficient. The 
fifth column shows a Student’s t statistic based 
on the estimated unstandardized coefficient 
and its standard error, from which the p value is 
found based on the number of degrees of free- 
dom, in this instance 1672 for the intercept, 
gender, and ethnicity, and 10,950 for the other 
predictors in the model. The statistical signifi- 
cance of the Student’s t statistic is presented in 
the fifth column. 
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Table 33.1 Population averaged model for prevalence of marijuana use 


b* b SE(b) Student’s t Pp 

Ro’ = .28, p= .000 

df= 1672 level 2 

df= 10,950 level 1 

Logit(marijuana use) - - - - - 
Male .002 .008 .108 .08 .938 
Black —.035 —.221 .153 —1.44 .149 
Other —.023 —.224 .231 =—.97 .334 
Exposure (lagged) .248 137.009 14.62 .000 
Belief (lagged) —.221 -—.130 .010 —12.64 .000 
Age —.026 —.307 .195 —1.57 .116 
Age? —.189 —2.343 .198 —11.82 .000 
Age® 148 1.646 .165 9.96 .000 
Age* —.0389 -—.440  .161 —2.73 .007 
Intercept/Constant — —.971  .085 —11.49 .000 


Exposure, belief, and a function of age are 
statistically significant predictors, but gender 
and ethnicity do not appear to be statistically 
significant predictors of marijuana use. The 
strongest predictor of marijuana use appears 
to be exposure, followed by belief that it is 
wrong to violate the law. Because age is split 
into four components, two of which have sub- 
stantial standardized coefficients, the question 
arises, what is the contribution of age relative to 
other variables in the model, particularly expo- 
sure and belief, to the explanation of marijuana 
use? One possible approach to answering this 
question is to use the technique described in 
Menard (2004) of multiplying each standard- 
ized coefficient by the corresponding zero order 
correlation to estimate the direct contribution to 
the explained variance of each predictor, which 
gives us 


Sb*r = (—.032)(—.026) + (—.201)(—.189) 
+ (.169)(.148) + (.027)(—.039) = .063. 


Notice that the last of the four components of 
the sum is negative (but very small, —.001), 


not uncommon with correlations and stan- 
dardized coefficients that are small in magni- 
tude. For exposure, the direct contribution to 
the explained variance is (.447)(.248) = .111, 
and for belief it is (—.439)(—.221) = .097. Sim- 
ilarly, the direct contribution of ethnicity is 
(—.032)(—.035) + (—.027)(—.023) = .002, and for 
gender it is (.083)(.002) = .0001, a negligible 
effect. 


5 The unit-specific model for 
prevalence of marijuana use 


In contrast to the population averaged model, 
in the unit-specific model, the concern is with 
the extent to which a change in an independent 
variable for a particular observation is associ- 
ated with a change in the dependent variable for 
that same case. For example, how much of an 
impact would an increase in exposure or belief 
for a particular individual have on that indi- 
vidual’s marijuana use? This model would typi- 
cally be more appropriate for (1) developmental 
change, when different cases in the population 
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had different individual experiences (e.g., dif- 
ferent changes in belief) over time, and (2) 
narrowly based individual level as opposed 
to population level interventions, when differ- 
ent cases were subjected to different interven- 
tions or different levels of intervention. The 
random effects model includes specific random 
components for the cases which are not actu- 
ally estimated, but the parameters (the variance 
components) of the assumed distribution of the 
random effects are incorporated into the model 
(Hardin and Hilbe, 2003, pp. 42-49). As a prac- 
tical matter, results from estimating population 
averaged and random effects models are often 
quite similar, but in some instances there may 
be interesting differences. Table 33.2, parallel 
in structure to Table 33.1, presents the results 
of estimating a random effects model for the 
prevalence of marijuana use. 

A random effects model may include a ran- 
dom intercept By, random slopes B,,B,,..-.,Bx;, 


or both. In the present analysis, the model was 
tested separately for random effects in the coef- 
ficients for age and the coefficients for expo- 
sure and belief, but the random effects for 
these coefficients were not statistically signif- 
icant, and thus they are excluded from the 
model. The variance attributable to the random 
intercept is statistically significant (p = .000), 
and the intraclass correlation (the proportion 
of the total variance attributable to the vari- 
ance between level 2 units, in this case indi- 
vidual respondents) rho = .518, p = .000. In 
other words, roughly half of the total variance 
between observations actually occurs between 
individuals (the level 2 units). 

For the model in Table 33.2, R,” = .29 (p= 
.000). Once again, exposure, belief, and age are 
statistically significant, and gender is not sta- 
tistically significant as a predictor of marijuana 
use, but this time ethnicity is also a statisti- 
cally significant predictor. Substantively, the 


Table 33.2 Random effects model for prevalence of marijuana use 


b* b SE(b) Student’s t Pp 
Ro? =.29, p= .000 
df= 1672 level 2 
df= 10,950 level 1 
Dy = 29,565.17 
Dognay = 31,691.19; Dodogity =14.678.19 
Gy, = 2126.02, df = 9, p = .000 
R,” = .14 (based on Dorogit) and Gy) 
Logit(marijuana use) - - - - - 
Male —.010 —.066 .122 —0.54 .587 
Black —.037  —.366 .160 —2.29 .022 
Other —.024 —.358 .250 —1.43 .152 
Exposure (lagged) .266 .229  ~=.010 23.78 .000 
Belief (lagged) —.229  —.212 .012 —18.19 .000 
Age —.030 —.560 232 —2.41 .016 
Age? —.172 -—3.348  .249 —13.44 .000 
Age® 138 2.407  .217 11.08 .000 
Age? —.035 —.612 .202 —3.04 .000 
Intercept/Constant — -1.394 .100 —13.90 .000 
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results are similar to the results from the pop- 
ulation averaged model. Based on calculation 
of the direct contribution of each predictor to 
the explained variance in the dependent vari- 
able, using the same technique described above, 
the strongest predictors are exposure and belief, 
followed by age. The effect of ethnicity is statis- 
tically significant, with African-Americans in 
particular being less likely to use marijuana 
across this part of the life span, but substan- 
tively the effect is weak, with ethnicity (both 
Black and Other) accounting for little more than 
0.1% of the variance in marijuana use. The 
effect of gender is not statistically significant. If 
we plot the mean predicted values of marijuana 
use with age, we obtain the result in Figure 33.2. 
Note that by projecting the plot past age 10, 
we would obtain a negative predicted probabil- 
ity of marijuana use, illustrating the danger of 
projecting predicted values using polynomial 
functions past the range for which the polyno- 
mial function was calculated. Since the fourth 
power of age is negatively signed and dominates 
the function after about age 30, the same would 
occur for older ages. Broadly, one can think of 
the model as (a) describing the pattern of mar- 
ijuana use across the life cycle from age 12 to 


Plot of PPMRJ1 with Age 
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Figure 33.2 Predicted mean prevalence of 
marijuana use (PPMRJ1) with age 


age 33, and (b) explaining some of the error in 
prediction one would obtain using age alone as 
a predictor in terms of gender, ethnicity, belief, 
and exposure. 


6 Extensions and contrasts 


The multilevel models in Tables 33.1 and 33.2 
can be extended to include interactions among 
age and the other level 2 predictors (exposure 
and belief), interactions among the level 1 pre- 
dictors (gender and ethnicity), and interactions 
across levels (e.g., an interaction between expo- 
sure and gender), the last being represented by a 
level 1 effect of gender on the level 2 coefficient 
for exposure. The extension of the multilevel 
logistic regression model to polytomous nomi- 
nal and ordinal variables is relatively straight- 
forward conceptually. A polytomous ordinal 
dependent variable will typically be modeled 
with a single set of B parameters using a 
cumulative logit model with the assumption 
of parallel slopes, and a polytomous nominal 
dependent variable with c categories will have 
c-1 separate functions (equations), as in logistic 
regression analysis more generally. The polyto- 
mous nominal model may also be applicable to 
ordinal polytomous dependent variables when 
the parallel slopes assumption is not justified. 
Multilevel change modeling for categorical 
dependent variables is different from two other 
approaches, panel analysis and event history 
analysis, in important ways. Panel analysis with 
logistic regression (see Menard, Chapter 30, in 
this volume) typically involves fewer than five 
separate time points, and does not necessarily 
include a time dimension in the model. For 
example, in Chapter 30, several of the mod- 
els in Tables 30.2 and 30.3 included no time 
dimension at all, and when they did include 
age as a time dimension, because there were 
only two waves of data, the age comparisons 
really involved between-individual rather than 
within-individual differences in marijuana use 
associated with age. A time dimension is one of 
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the defining features of event history analysis, 
and although not absolutely necessary in a mul- 
tilevel model (age and time may prove to have 
no statistically significant impacts on the out- 
come variable), we typically begin in multilevel 
longitudinal modeling by examining the rela- 
tionship of the outcome with at least one time 
dimension. Where multilevel change modeling 
and (usually) panel analysis differ from event 
history analysis is in not restricting the analysis 
to cases defined as being “at risk” of change in 
the direction of interest. In event history anal- 
ysis, initiation (whether someone who has not 
previously used marijuana begins to do so, for 
which only nonusers are at risk) or suspension 
(whether someone who has been using mari- 
juana ceases to do so, for which only users are at 
risk) would typically be the variables of interest, 
each with its own separate at risk set of cases. 
In multilevel analysis and (again usually; see 
Chapter 30, option d for measuring change in 
dichotomous dependent variables) panel anal- 
ysis, it is more typically prevalence (simply 
whether one does or does not use marijuana) 
that is the dependent variable of interest, and 
the risk set is typically defined as all respon- 
dents. Multilevel analysis, panel analysis, and 
event history analysis thus present a comple- 
mentary set of approaches for the analysis of 
longitudinal categorical data. 


7 Conclusion: multilevel logistic 
regression for longitudinal 
data analysis 


Most of the existing literature on multilevel 
modeling treats the analysis of discrete depen- 
dent variables generally, and logistic regression 
in particular, as an afterthought or a secondary 
issue, assuming a focus on interval/ratio depen- 
dent variables and partitioning of variance (and 
by the way, if you are unfortunate enough to 
have to deal with discrete dependent variables, 
here is something that may help). Here, the 


focus has been somewhat different, attempt- 
ing to strengthen the bridge between multilevel 
modeling and modeling discrete dependent 
variables. In this light, greater emphasis has 
been given than is usual in the multilevel mod- 
eling literature to the calculation of appropriate 
measures for explained variation to assess the 
overall impact of the predictors on the depen- 
dent variable, and standardized coefficients for 
assessing the relative impact within a model 
of predictors measured on different scales. The 
good news is that for the simplest (e.g., ran- 
dom intercept) models, one can relatively easily 
obtain estimates from existing software pack- 
ages. The bad news is that (a) in order to 
know for sure whether the simpler model is 
adequate, it may be necessary to first exam- 
ine the more complex models, (b) the statistical 
routines in general-purpose statistical software 
packages are not up to the task of estimating 
the more complex models, and one must rely 
instead on more specialized multilevel statisti- 
cal software (an observation also made by Luke, 
2004, p. 73), and (c) for more complex mod- 
els, it may not be possible to obtain maximum 
likelihood estimates, with all that implies for 
assessing the relative fit of different models and 
explained variation based on the criterion actu- 
ally being maximized in a logistic regression 
or logit model approach. The principal strength 
and importance of multilevel logistic regression 
modeling for longitudinal data is that it takes 
into account dependencies in the data occa- 
sioned by repeated measurement of the same 
cases (level 2 units) in modeling the relation- 
ship of the outcome with time (chronological or 
age), time-constant attributes, and time-varying 
covariates. 


Software 


Analysis of the relationship between mari- 
juana use and the orthogonal polynomial con- 
trast of age was performed using both SAS 
and SPSS logistic. The results in Tables 33.1 
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and 33.2 were calculated using the statistical 
package HLM (Raudenbush et al., 2004). Sim- 
ilar results were obtained for the population 
averaged and random intercept models using 
Stata xtlogit. In presently available software, 
it may be necessary to calculate the measures 
of explained variation and standardized logis- 
tic regression coefficients used here by hand, 
using the squared correlation between observed 
and predicted values of the dependent vari- 
able to calculate R4 (Menard, 2000 and 2002), 
and using the formula described earlier in this 
chapter (and in more detail in Menard, 2002 
and 2004) for standardized coefficients. Popu- 
lation averaged models may be calculated using 
software other than dedicated multilevel soft- 
ware when that software permits adjustment for 
dependency in repeated measures within cases. 
For ordinal polytomous dependent variables, a 
test of the parallel slopes assumption may be 
available in the particular multilevel modeling 
software being used; if not, the best option for 
testing the assumption of parallel slopes may 
be to calculate the model using standard sta- 
tistical software such as SAS, SPSS, or Stata, 
solely for the purpose of choosing between the 
alternative polytomous multilevel models, then 
calculating the selected model using the appro- 
priate multilevel software. 
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| Chapter 34 i 


A brief introduction to time 
series analysis 
Scott Menard 


1 Introduction 


This chapter provides a brief introduction to 
time series analysis, focusing on autoregressive 
integrated moving average (ARIMA) analysis of 
time series. A time series is a set of repeated 
measurements of the same variable taken on the 
same unit of analysis (e.g., an individual, city, 
nation; more generally, a subject or a case) for 
two or more points in time. As used in the social 
and behavioral sciences, time series analysis 
typically refers to a large number of observa- 
tions taken on a single case, typically at equal 
measurement intervals, possibly on more than 
one variable. Strictly speaking, even if we have 
a large number of cases N and a smaller number 
of time periods T we are, in fact, performing 
time series analysis, but the terminology used 
is that of two-wave or multiwave panel analy- 
sis, latent or multilevel growth curve or change 
analysis, and event history analysis, topics cov- 
ered in other chapters in this volume. Here we 
focus on analysis of a single case (N= 1) for a 
large number of time periods (typically T > 20 
and preferably t > 50 for most purposes, some- 
times T > 100). 

Time series analysis will always have at 
least one of three goals: description, explana- 
tion, or forecasting. In principle, one can per- 
form time series analysis on variables at any 


level of measurement (dichotomous, nominal, 
ordinal, interval, or ratio-scaled variables), but 
in practice the techniques that are commonly 
described as time series analysis are applied to 
more or less continuous quantitative (interval 
or ratio-scaled) outcomes, with equal inter- 
vals between successive measurements. Focus- 
ing, then, on quantitative outcomes, we may 
use time series analysis to describe the level 
or value of the outcome variable as a func- 
tion of (1) time itself, (2) past levels or val- 
ues of the same outcome variable, (3) past and 
present values of a random change, or a ran- 
dom shock, to the level or value of the out- 
come, or (4) some combination of all three. 
In addition, we may attempt to explain the 
value of the outcome in terms of (5) one-time 
nonrandom shocks to the series, as in an inter- 
vention analysis in which an intervention or a 
policy change is tested to see whether it has 
an impact on the time series, or in terms of (6) 
one or more time-varying covariates, which are 
themselves time series and can be represented 
as inputs for the outcome variable. Finally, once 
we model the effects of time, past values of the 
outcome, random shock, and covariates, we can 
use that model to forecast future values of that 
outcome. 

An important consideration in time series 
analysis is whether the time series is stationary, 
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an issue to which we shall return repeatedly 
in this chapter. As described in any number 
of standard texts on time series analysis, a sta- 
tionary time series has a fixed mean and a con- 
stant variance about that mean. If, in addition, 
the autocovariance structure (the covariance 
between z, and z,_, fors=1,2,...,S<T)is also 
constant, the time series is said to be strictly sta- 
tionary. Nonstationary time series exhibit either 
changing variance or changing mean, or both. 
A changing mean, or trend, may involve deter- 
ministic changes that are a function of time, 
predictable from past values of the time series; 
or they may be stochastic changes, the cumula- 
tive result of a series of random shocks which, 
at random, have more often “pushed” the time 
series up (or down) rather than in the opposite 
direction. Particularly in the case of bivariate 
time series analysis, if both series have simi- 
lar stochastic trends, the stochastic trends may 
produce a spurious positive correlation inflat- 
ing estimates of how well one series predicts 
or explains the other, or it may attenuate a true 
negative correlation between the two series. 
Alternatively, if the stochastic trends are in 
opposite directions, they may either attenuate a 
true positive correlation or produce a spurious 
negative correlation between the two series. 


1.1 Notation 


The notation used in describing equations for 
time series analysis is not standard across dif- 
ferent sources. In this chapter, the following 
notation will be used; capital letters refer to 
variables, lower case letters to values of vari- 
ables or constants. 

T is the number of times, or equivalently, the 
number of observations in the time series; t= 1, 
2,..., 1 may also be a variable in the equation 
for the dependent variable. 

Z is the dependent variable; z, is the depen- 


dent variable measured at time t=1, 2,..., T. 

{Y,,} is a set of K time-varying covariates; y, 
is the time-varying covariate Y,,k=1, 2,...,K 
measured at time t=1, 2,..., T. 


{X; 5} is a second set of J covariates, possibly 
but not necessarily time-varying; x;, is the time- 


varying covariate X;; j=1, 2,...,J measured at 
ime t= 15.2) s.05 T- 
A residual at time t=1, 2,..., Tis denoted e,. 


When the residual e, is a white noise residual 
(random shock) it is denoted a. 

Coefficients for lagged endogenous vari- 
ables 21, Z%-2,---,2-p are denoted $,, p= 
1,2,..85.,47 27; 

Coefficients for past random shocks a,_,, 
At_a5+++,A_q are denoted 6,,q=1,2,... see Ts 

Coefficients for time-varying covariates are 
denoted f,, or y;, where k or j refers to the 
covariate Y, or X,, k=1,2,...,K,j=1,2,...,]J, 
and t=1,2,...,T is the time at which the 
covariate was measured. 

Coefficients for time (t) are denoted X, sub- 
scripted for different functions or transforma- 
tions of time, f(t); in general, z, =A, +A,f,(t)+ 
No f,(t) + +--+ Ay fy (t) for m=1, 2,...,M differ- 
ent functions of time. 


2 Describing or modeling the 
outcome as a function of time 


Some outcomes can be modeled as direct func- 
tions of time, with time conceived as a predictor 
but not a cause of the outcome. The compound- 
ing of interest for a savings bond is one example. 
A fixed amount is deposited, and the account 
grows at a fixed rate as long as the bond is 
held. The direct “causes” of this growth lie in 
(a) the agreement upon the terms of the con- 
tract between the buyer and the seller of the 
bond and (b) the decision of the bondholder to 
continue to hold the bond, but the growth in 
the account itself is a deterministic function of 
time, and the only information needed to pre- 
dict the total value of the account at any given 
time is (1) the value of the account at some 
previous time, (2) the amount of time lapsed 
since the time for which the previous value 
was obtained, and either (3a) the interest-rate 
terms for the account or (3b) a sufficient num- 
ber of values of the account at different prior 
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times to calculate the interest rate, given gen- 
eral knowledge of the interest-rate terms of the 
account (e.g., annual, daily, or other compound- 
ing of interest), even without knowing the spe- 
cific interest rate. With (3a), knowledge of the 
interest-rate terms, we can directly calculate the 
value of the account at a given time. With (3b), 
knowledge of the time series of prior values of 
the account (plus a knowledge of the mathe- 
matics of compound interest), we can select a 
function of the appropriate form but with the 
interest rate as an unknown variable, use the 
time series data to calculate the interest rate, 
and from that information we can calculate the 
value of the account at any given time. 

To illustrate, assume that the account earns 
some unknown rate of interest, x, per year (or 
whatever time interval is appropriate). Then 
given an initial value of the account at time 
zero, Z), and measurements for at least one 
other time, t, where the measurement interval 
between times t = 1,2,...,T corresponds to the 
interval at which interest is compounded, we 
know that z, = z)(1+x)'. In the case of a purely 
deterministic model for which the function is 
known, we need only one time point to solve for 
x: X = (z,/Z,)1/'—1, and given x we can calculate 
what z, will be for any time after (or for that mat- 
ter, all possible values of z, prior to) t= 0. With 
the information about the form of the curve and 
values of z, at different times, we have fit the 
curve for the outcome (the value of the account) 
by calculating the unknown parameter x. 

More generally, the precise function defining 
the curve (or straight line) to be fit will not be 
known, and the first step in the analysis will 
be to estimate what that function might be. For 
this we need values of the outcome for many 
more than two points in time. Assuming no sys- 
tematic or random error, a longer time series 
should indicate whether the values of the out- 
come increase, decrease, or oscillate over time, 
whether (and if so how many times) the curve 
changes direction from increasing to decreasing 
or vice-versa, and based on this information we 


may be able to choose a suitable function for 
the curve, but for any given curve, there will be 
an indefinitely large number of possibilities. 

One possible basis for choosing one partic- 
ular curve may be to have some idea of the 
underlying process, as in the previous example 
of compound interest. If we have no real knowl- 
edge of the underlying process, however, we 
may use the knowledge that any curve can be 
approximated by a polynomial of order M <T, 
and estimate the parameters for the equation. 
For a model with an additive effect of time (the 
value is changed by adding a fixed function of 
time), 


2 = Ao + Ag (t) + AQ (t”) + are +hy(t™), 


where h, = Zy (1) 


that is, z, is equal to the intercept when t = 0. 

Alternatively, the amount may be increased 
by some multiplicative function of time. There 
are several different multiplicative functions 
that could be used, only some of which are 
linear in their parameters (and hence can be 
estimated using ordinary least squares linear 
regression). One possibility is that the relation- 
ship can be expressed as an exponential func- 
tion of a polynomial function of time, 


Z_ = OXP(Ag HAGE + Agt? +--+ Ayt™) (2) 


in which exp(x) = e* where e is the base of the 
natural logarithm, and which translates into 


In(z,) = Np HAgt+Agt? +--+ Ayt™ (3) 


and here In(z)) = Ao, and In(z,) is expressed as a 
polynomial function of time that is linear in the 
parameters. Other possibilities exist involving 
other transformations of z, and other functions 
of time, and these possibilities can be modeled 
using existing time series software (e.g., SPSS 
CURVEFIT). 

The order of the polynomial m= 1, 2,...,M 
may be guessed from visual inspection of the 
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curve for the number of changes in the curve 
between positive and negative trends, and M 
should be set to at least one more than the num- 
ber of changes in trend. Alternatively, one can 
test polynomial functions of different orders 
to see which of the coefficients \,,\,,...,Ayy 
can be dropped. For this purpose, because the 
various powers of t if untransformed will be 
highly collinear, it will be necessary to use an 
orthogonal contrast (e.g., Scheffé, 1959) of the 
appropriate order for t in order to ascertain 
which of the \,,A,,...,A,, are statistically sig- 
nificant. There are different possible criteria for 
deciding which of the \ coefficients to retain, 
including (a) a hierarchical approach, retain- 
ing all coefficients, up to and including the 
last statistically significant coefficient, regard- 
less of the statistical significance of coefficients 
between the first and last statistically signif- 
icant coefficients, and (b) a nonhierarchical 
approach, dropping all coefficients that are not 
statistically significant but retaining all coef- 
ficients that are statistically significant, either 
in a single-step procedure (a single assessment 
of statistical significance, controlling for all of 
the other coefficients) or in a stepwise pro- 
cess that eliminates the least statistically sig- 
nificant coefficient first, then re-estimates the 
model and repeats the process until all of the 
remaining coefficients meet some criterion of 
statistical significance. One can also choose (c) 
to limit the powers of t a priori to some max- 
imum, for example 4 or 6. This last approach 
stems from the observation that in the natu- 
ral sciences, phenomena involving exponents 
larger than 3 or 4 are exceedingly rare, plus 
the concern that, by including higher powers 
of t, we may be overfitting the model, fitting 
random error variation rather than true vari- 
ation in the outcome. Alternatively, based on 
the assumption that the polynomial function is 
only approximating the true function, we may 
choose to include the higher powers of t if they 
appear to be statistically significant. 


This approach may work well in the absence 
or near absence of random error, but any sub- 
stantial amount of random error (or noise) in 
the time series may produce apparent increases 
or decreases at different points on the curve 
that lead to overestimation or underestimation 
of the value of the outcome, and more apparent 
changes in trend than are really characteristic of 
the true relationship between the outcome and 
time. To address this problem and the associ- 
ated risk of overfitting the curve, curve smooth- 
ing techniques may be applied to reduce the 
random error at each point on the curve. One of 
the more common techniques of smoothing is 
to represent each point on the curve by a mov- 
ing average. For example, z, may be represented 
as the average (Z,_; + 2, +2Z,,,)/3. Alternatively, 
one could take a longer moving average (e.g., 
5 instead of 3 time points). The use of a moving 
average relies on the expectation that the sum 
of a series of random errors will be equal to 
zero, and hence the random errors from previ- 
ous and subsequent values of z, will cancel out 
the random error in z,. One could also weight 
the moving average, for example representing 
z, by (z_, +2z,+2%,,), giving more weight to the 
current value of z,. This arises naturally when 
using double moving averages for smoothing. 
A double moving average is simply a moving 
average of moving averages, where one first 
takes a moving average of the z, producing new 
values z,’, then takes the moving average of the 
z,’ in turn. As illustrated in Yaffee and McGee 
(2000, p. 22), using a double moving average of 
length 3 for both the first and the second mov- 
ing average, z, is expressed as z/ = (Z,_,+22,_,+ 
3Z, + 22141 +2Zt42)/9. Once the curve has been 
smoothed using single or double moving aver- 
ages, it may be possible to identify the appro- 
priate function or to calculate an appropriate 
polynomial approximation to the function to 
describe the series. Once this is done, one can 
calculate residuals e, = z, —z, where z, is the 
estimated value of z, based on the fitted curve. 
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These residuals can then be subjected to further 
analysis. 

An example of curve fitting in the social 
sciences is the ambitious attempt by Hamblin 
et al. (1973) to describe processes of social 
change. Hamblin et al. model time series for 
such dependent variables as industrial startups, 
adoption of innovations such as hybrid corn 
and new drugs, general strikes, political assas- 
sinations, air passenger miles, gasoline con- 
sumption, and gross national product. Using a 
combination of differential equations and curve 
fitting, they derive a series of models, includ- 
ing linear, exponential, and logistic models, 
to describe (and provide theoretical explana- 
tions for) the evolution of these variables over 
time. The focus on these models is (a) assump- 
tions about the nature of the process driving 
change in the dependent variable over time, and 
(b) given the assumption of constancy in that 
process, modeling the dependent variable as a 
function of time, with allowances for change in 
the process which may result in changes in the 
direction or other characteristics of the curve 
for different “epochs” or periods within which 
the assumptions about the process are valid. 
For Hamblin et al., the theory generates expec- 
tations regarding the relationship of the depen- 
dent variable with time, which are then tested 
by fitting the appropriate function of time to the 
dependent variable. 


3 Describing or modeling the 
outcome as a function of present 
and past random shocks 


It is possible that there is no deterministic trend 
in the model, or that any such trend has been 
accounted for and removed from the model 
(e.g., by fitting the curve and calculating the 
residuals, as described above, leaving the resid- 
uals for further analysis). There may still remain 
systematic patterns to be described in the series. 
One possibility is that the outcome, z,, can be 
described as the function of a random shock at 


time t, a,, plus the lingering effects of a series 
of some number q of past random shocks a,_;, 
Aj_2,+++, Ag, occurring to the series after some 
initial time t, but prior to t (with z, assumed to 
be measured immediately after a, occurs). The 
moving average process of order q, MA(q), as 
described in Box and Jenkins (1970), is mod- 
eled as z, = a, + 0,4, +9,a,_. +--- +0 aq-1, a 
linear function of the current random shock and 
the past q random shocks. The 6 coefficients 
are parameters to be estimated, and q is to be 
identified based on the autocorrelation func- 
tion (ACF) and partial autocorrelation function 
(PACF). The ®@ coefficients represent the mag- 
nitude of each of the past random shocks, and 
q represents the length of time after which the 
effect of a previous random shock is completely 
(or at least for all practical purposes) dissipated. 
It is assumed that the random shocks have a 
mean of zero, i.e., E(a,) = 0; if, instead, they vary 
around a nonzero mean, the series is said to 
exhibit drift, random variation about a nonzero 
mean, and the series is nonstationary. 

In modeling an MA(q) process, the first order 
of business is to identify q using the ACF and 
PACF. The ACF represents the correlation of 
each value of Z with itself at different lags: z, 
with z,_, for all t (lag 1); z, with z,_, for all t 
(lag 2); and so forth, for some number of lags 
7, typically with 7 < .5T, and for longer series, 
7 may be substantially less than .5T (e.g., 40 or 
fewer lags). The ACF is calculated as 


T-Tt 
r, =[1/(T—1)] ))(%—2)(@-2-2)/[1/(T — 7] 
LT 
x \0(z,—Z)? (4) 


where Z is the mean of the time series and 7 is 
the number of lags for which autocorrelations 
are calculated. The standard error for r, is 


sete) = Jarma+ed (5) 


L=0 
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where L is the lag and 1,” is defined as r,” = 0. 
The PACF is the partial autocorrelation of y 
with itself at each lag, controlling for autocorre- 
lations at intervening lags: z, with z,_, control- 
ling for the correlation of z, and z,_, with z,_,; 
z, with z,_, controlling for the correlations of z, 
and z,_, with z,_, and z,_,; etc. The PACF is cal- 
culated in an iterative process, beginning with 
PACF (lag 1) =r,, PACF (lag 2) = (r, —r7)/(1 —17), 
and continuing for higher order partial autocor- 
relations expressed in terms of the current auto- 
correlation and lower order autocorrelations. 
The standard error for the partial autocorrela- 
tion is defined as SE(PACF) = 1/VT. 

The characteristic pattern for an MA(q) pro- 
cess is that (1) the ACF “spikes” (is large enough 
to be statistically significant, and is markedly 
larger than other autocorrelations) at the first 
q lags, where q may in principle be 1, 2,..., 
T-2, and beyond lag 1 the ACF is otherwise 
small and not statistically significant; and (2) 
the PACF declines fairly quickly as the length 
of the lag increases. Once q has been identified, 
we proceed with the estimation of the 0 coef- 
ficients. In an MA(q) model these coefficients 
must satisfy the condition of invertibility. For 
an MA(1) process, invertibility exists when the 
absolute value of 6, is less than 1: |6,| <1, that 
is, —1 < 6, <1. For an MA(2) process, there 
are three conditions: (1) 0,+ 0, <1; (2) |6,| < 
1; and (3) 6, —0, <1. More complicated con- 
ditions for invertibility exist for higher order 
MA processes, but MA processes with q > 2 are 
relatively rare in practice. 


4 Describing or modeling the 
outcome as a function of past values 
of the outcome (plus a,) 


Even when there is no deterministic trend in the 
data, a stochastic trend may appear as a result 
of the cumulative impact of random shocks in 
the series, even if there are no lingering effects 
of the random shocks. The simplest model 
for this stochastic trend is the random walk, 


in which a random shock added to the most 
recent value of the series produces the current 
value of the series: z, = z,_, +a, or equivalently 
(z, — Z,_,) = a. This process is also described as 
an integrated process of order d= 1, or I(1), 
indicating that the difference between two adja- 
cent values of the outcome is a random shock, 
a,. Higher order integrated processes are also 
possible, starting with an integrated process 
of order 2, I(2), for which (z,—z,_,) — (4%_,—- 
Zt) = Z, — 2Z,_,+2,_, = a, In practice, inte- 
grated processes of order higher than d = 2 are 
rare, but although calculation of such processes 
becomes increasingly tedious, they are possi- 
ble. As is evident from the foregoing, in the 
integrated process, the value of the outcome 
depends directly on its past value(s) plus a ran- 
dom shock. The symbol V¢ is often used as a 
shorthand for a difference corresponding to an 
integrated process of order d, such that V* = 
(2 — 21), V* = (2 — 2-1) — Za — 2-2), etc. For 
an integrated process of order 1, I(1), probably 
the most common integrated process encoun- 
tered in practice, the characteristic pattern of 
the ACF is that it declines gradually with time, 
while the PACF spikes for lag 1. Integration or 
differencing is limitedly parallel to curve fitting 
using a polynomial function; taking the first 
difference removes the linear trend, taking the 
second difference removes the quadratic trend, 
and so forth, but unlike polynomial (or other) 
curve fitting, the use of first or higher order dif- 
ferences does not inform us about the pattern 
or shape of the observed relationship of the out- 
come variable with time. 

As an alternative to the integrated model 
using first or higher order differencing, one or 
more past values of the outcome may be hypoth- 
esized to have a lingering effect on z, that is not 
adequately captured by the simple integrated 
process; instead we model an autoregressive 
process of order p, AR(p), where p is the num- 
ber of past values of the outcome on which the 
current value, z,, depends. In an AR(1) process, 
Z, = ,Z,_, +a,. This looks a lot like the random 
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walk, except for the coefficient, 6,, which adds 
a coefficient to be estimated to represent the 
dependence of z, on z,_,. If 6; =1, we have not 
an AR(1) process but instead an I(1) process; 
for an AR(1) process, it must be the case that 
—1<, <1. For an AR(2) process, z, = 6,2,_,; + 
,Z;_. +a,. Like the MA(q) process, the AR(p) 
process may be identified based on the ACF 
and PACF, with the characteristic pattern of an 
AR(p) process being somewhat similar to the 
pattern for an I(d) process and a mirror image of 
the pattern for an MA(q) process. For an AR(p) 
process, it is the ACF rather than the PACF that 
declines fairly quickly with time, and the PACF 
rather than the ACF that has “spikes” at the p 
lags for which we can expect the effects of prior 
values of the outcome to be significant. The 
difference between this and the I(d) process is 
that for the I(d) process, the ACF declines more 
gradually with time than for the AR(p) process. 

Also parallel to the MA(q) process, there are 
conditions that must be met for the AR(p) pro- 
cess to be estimated, but here they are con- 
ditions of stationarity rather than invertibility. 
For an AR(1) process, stationarity requires that 
\b,| <1 [and as noted above, if 6, = 1, the sup- 
posed AR(1) process is actually an I(1) process]. 
Note also that, as implied above, the autocor- 
relation may be either positive or negative; i.e., 
for some time series, there is a tendency for 
oscillation in the value of z, such that a high 
value of z, tends to be followed by a low value 
of z,, characteristic of a process that strives 
to reach an equilibrium. For an AR(2) model, 
stationarity requires that (1) $, +, < 1;(2) 
|b.| < 1; and (3) 6, —, <1. More complicated 
AR(p) models have more complicated condi- 
tions of stationarity, but in practice AR(p) mod- 
els of order greater than p = 2 are relatively rare. 


4.1 The Dickey-Fuller and augmented 
Dickey-Fuller tests 


As noted earlier, stationarity in a time series 
means that there is a constant mean, and con- 
stant variance about that mean. In particular, 


in order for a time series to be stationary, it 
is necessary to account for and remove any 
trend, stochastic or deterministic. A determin- 
istic trend may be removed by curve fitting 
using the methods described in Section 1 of 
this chapter, calculating the residuals, and per- 
forming the time series analysis on the resid- 
uals of the fitted curve. Stochastic trends, as 
in a random walk, may often be removed by 
taking first or second differences. It is possi- 
ble to test for the presence of an I(1) compo- 
nent to the time series using the Dickey-Fuller 
or augmented Dickey-Fuller tests. The Dickey- 
Fuller tests calculate an autoregressive model 
and test whether the coefficient , is statisti- 
cally different from one. If it is not, it will be 
necessary to difference the series to achieve sta- 
tionarity. Different versions of the augmented 
Dickey-Fuller tests provide evidence regarding 
the evidence of random walk without drift, ran- 
dom walk with drift, or trend in a time series. 
The Dickey-Fuller or augmented Dickey-Fuller 
tests and other tests for whether , = 1 (or, as 
it is sometimes described, whether the series 
has a unit root) are commonly available in time 
series analysis software. 


5 The ARIMA(p,d,q) model 


A time series may be described as a combination 
of autoregressive, integrated, and moving aver- 
age components: AR(p) plus I(d) plus MA(q), or 
ARIMA (p,d,q). Note that the AR(p) and I(d) are 
both processes that model the current value of the 
outcome as a function of one or more past values 
ofthe outcome. One might thus expect to find one 
or the other, but not both, processes operating in 
a single model, but, in principle, all three pro- 
cesses could be combined in a single model. For 
example, an ARIMA(1,1,1) model may be written 
(2, = 21) = 1 (Z_y — 2-2) + Oa; +a, or VIZ, = 
ob, V'Z,_, + 9,a;_, +a; In English, the difference 
between the current and the immediately past 
value of the outcome is equal to the previous dif- 
ference (the difference between the immediately 
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past value of the outcome and the value of the 
outcome immediately prior to that) multiplied by 
o,, plus the value of the immediately past ran- 
dom shock multiplied by 6,, plus the current ran- 
dom shock a,. Stated another way, the difference 
between the present and the immediately past 
value of z, is equal to (1) a random shock a; 
plus (2) a linear function of the previous ran- 
dom shock, 0,a,_,, the MA(1) component; plus 
(3) the previous value of the outcome, the I(1) 
component; plus (4) a linear function of the pre- 
vious difference in the values of the outcome, 
ob, (Zi_1 — Z_2), the AR(1) component. 

As noted earlier, the order of each of the three 
types of processes is relatively small in practice. 
For AR(p) and MA(q) processes in particular, 
there is a theoretical reason why this should be 
the case. Briefly, an AR(p) model converges to 
an MA(1) model as p becomes infinitely large, 
and an MA(q) process converges to an AR(1) 
model as q becomes infinitely large. The larger 
the order of p or q, the better the AR(p) or 
MA\(q) processes can be represented by a lower 
order process of the opposite type. Identifica- 
tion of which one or more of the processes best 
describes the time series is based on examina- 
tion of the ACF and PACF, as described earlier. 
It is not always the case, however, that the ACF 
and PACF unambiguously identify p, d, and q 
for a time series. In this case, it may be useful 
to compare different possible models for how 
well they describe the time series. The Dickey- 
Fuller test is one criterion to consider. A sec- 
ond is whether all of the coefficients ( and 8) 
in the model are statistically significant; if not, 
the model is rejected, and a different model, 
without the nonsignificant coefficients, is more 
appropriate. There are also several general tests 
of model fit, including the Akaike information 
criterion (AIC), the Bayesian information cri- 
terion (BIC), and the Schwartz Bayes criterion 
(SBC), available when the parameters of the 
model are estimated using maximum likelihood 
(see, e.g., Wei, 2006, pp. 156-157). The lower 
the AIC, BIC, or SBC, the better the fit of the 


model. As noted in Wei (2006), the BIC may be 
preferable to the older AIC, because the AIC has 
a tendency to overparameterize the model. As 
noted in Tabachnick and Fidell (2007, 18.16), 
the AIC has the property that the difference in 
two AIC statistics is distributed as a x? statis- 
tic with appropriate degrees of freedom, when 
one model is nested within the other, thus 
giving us a statistical test of the difference in 
the AIC for the two models: x? =Larger AIC — 
Smaller AIC. For further details on these and 
other tests used in time series analysis, see, 
e.g., Cromwell, Labys and Terraza (1994) and 
Cromwell, Hannan, Labys and Terraza (1994). 


6 Example: IBM stock prices 


Figure 34.1 is a chart of IBM common stock 
closing process taken daily from May 17, 1961 
to November 2, 1962, Box and Jenkins’ (1976, 
p- 526), Series B, a well-known example in time 
series analysis. The series appears to increase, 
then decrease, then increase again (and per- 
haps decrease thereafter), and seems like a rea- 
sonable candidate for polynomial curve fitting. 
Here, we calculate a fourth order polynomial 
to account for the three changes in direction 
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Figure 34.1 IBM common stock closing prices, 
daily, May 17, 1961 to November 2, 1962 

Source: Calculated from Box and Jenkins (1976, p. 526) 
series B 
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Equation: Price = 514.488 — 239.633t — 689.325t? + 391.8741? + 1 70.080t* 
R? =.891, P=.000; all coefficients statistically 
significant (p = .000). 


Figure 34.2 Quadratic model for IBM stock price 


in the curve. The values of the time dimension 
have been transformed to reduce the collinear- 
ity among the different powers of time in the 
polynomial equation. 

Figure 34.2 presents the results of fitting 
the fourth order polynomial to the IBM stock 
price data. The darker curve represents the 
predictions generated by the equation: Price 
= 514.488 — 239.633t — 689.325t? + 391.847t? + 
170.080t*. Again, the variable t (time) has 
been transformed to reduce collinearity, so the 
numeric values of the coefficients are not them- 
selves meaningful; what they do is (a) indi- 
cate that the coefficients for the linear and 
quadratic terms are negative, while the coeffi- 
cients for the cubic and quartic terms are pos- 
itive, and (b) produce the predicted curve in 
Figure 34.2. This predicted curve is overlaid 
with the observed values of the curve from 
Figure 34.1. We can see that prediction is good 
earlier in the series, but the fourth order poly- 
nomial fails to capture the pattern later in the 
series, suggesting that a higher order polyno- 
mial may be necessary to better approximate 
the series. The polynomial curve provides a 
description of the pattern of the IBM stock 
prices over time; but absent some theory of 
why we would expect the process to follow 
the pattern represented or approximated by a 


fourth or higher order polynomial, it offers lit- 
tle insight into the process by which the pattern 
was generated. 

Figure 34.3 presents the autocorrelation and 
partial autocorrelation functions for the same 
IBM price data. The autocorrelation function in 
Figure 34.3 appears to display the very grad- 
ual decline characteristic of an integrated pro- 
cess, and the partial autocorrelation function 
spikes at lag 1, suggesting an I(1) process. An 
augmented Dickey-Fuller test confirms that the 
series appears to have a unit root. In Table 34.1, 
the ARIMA(0,1,0) model is calculated, along 
with three alternative models: an ARIMA(1,0,0) 
model or equivalently an AR(1) model, to fur- 
ther illustrate why the I(1) model is prefer- 
able to the AR(1) model here; an ARIMA(0,1,1) 
model representing a random walk with drift; 
and an ARIMA(1,1,0) model, which will be of 
more interest shortly. Results of some other 
models are summarized below Table 34.1 to 
explore the possibility that a higher order 
autoregressive or moving average parameter 
might improve the model. AIC and BIC statis- 
tics were included in the output and are also 
presented here. As noted earlier, the AIC tends 
to overparameterize ARIMA models, but it is 
included here at least for illustrative purposes 
because of its widespread use (in both time 
series analysis and other applications). 

First, note the magnitude of the & coefficient 
in the ARIMA(1,0,0) model. As indicated earlier, 
the augmented Dickey-Fuller test indicates that 
this coefficient is not statistically significantly 
different from one, indicating that an I(1) model 
is more appropriate than an AR(1) model. Sec- 
ond, comparing the AIC and BIC across the mod- 
els, the ARIMA(0,1,0) model has the lowest BIC 
but not the lowest AIC; instead, the AIC is low- 
est for the ARIMA(0,1,1) model, and the AIC for 
the ARIMA(1,1,0) model is practically identical 
to that for the ARIMA(0,1,1) model. Which model 
is best? According to the BIC, which is lowest 
for the ARIMA(0,1,0) model, the random walk 
without drift is not only the most parsimonious 
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Figure 34.3 Correlogram for IBM daily stock prices 


but also the best fitting model. The statisti- 
cally significant coefficients in the ARIMA(0,1,1) 
and ARIMA(1,1,0) models, however, suggest that 
the series contains something more than a ran- 
dom walk without drift. The AIC and BIC are 
practically identical for the ARIMA(0,1,1) and 
ARIMA(1,1,0) models, but the tiny difference 
that does exist favors the ARIMA(0,1,1) random 
walk with drift model. This is the conventional 


conclusion (Box and Jenkins, 1970, p. 186, 
Table 6.4) regarding the IBM stock price series. 
An important point here is that identification of 
the appropriate parameterization for atime series 
model is not always entirely clear, and different 
criteria may lead us to select different models. 
As noted earlier, fitting a fourth order poly- 
nomial to the IBM price data provides a reason- 
able description of the pattern, but little insight 
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Table 34.1 Comparison of fit statistics for IBM stock price models 


Dependent Statistics/ ARIMA ARIMA ARIMA ARIMA 

variable Coefficients (0,1,0) (1,0,0) (0,1,1) (1,1,0) 

IBM stock price AIC 2506.188 2519.405 2505.483 2505.485 
BIC 2514.004 2531.137 2517.207 2517.210 
DF 3 3 3 
, (p) 996 (.000)* —  .085 (.015)* 
0, (p) - 085 (.015)* - 
constant —.279 (.474) 438.2 (.000) —.279 (.531) 280 (.537 


ARIMA(2,0,0): 6, + 6, =1 

ARIMA(0,0,2): 6; + 8) >1 

ARIMA(0,2,0): AIC = 2721.572, BIC = 2729.283, 2df 
( 


ARIMA(0,0,1): AIC = 3863.924, BIC = 3874.657, 3df; 0, = .931, p = .000 


into the process. By contrast, the ARIMA model 
does the opposite. Based on this analysis (just 
a replication of a classic example), the ARIMA 
model indicates that the process generating the 
series is simply a nondeterministic function of 
random shocks to the series, with some lin- 
gering effect of the immediately previous ran- 
dom shock. This is the definition of a random 
walk with drift. However, knowing that the pro- 
cess is a random walk by itself tells us nothing 
about the pattern of the stock price over time, 
and might even lead to the mistaken conclu- 
sion that there was no trend in the stock price. 
Instead we have a nonlinear stochastic trend, 
better described (as opposed to explained) by 
the polynomial equation than by the specifica- 
tion of the model as ARIMA(0,1,0). The curve 
fitting and ARIMA approaches thus provide us 
with different and complementary information 
about the pattern and the process, respectively, 
of the time series. 


7 Example: homicides in three 
midwestern states 


In Figures 34.4, 34.5, and 34.6, ACFs and PACFs 
are presented for the homicide rate per 100,000 
people from 1933 to 1980 in Ohio, Wisconsin, 
and Illinois, using data from Kohfeld and 


Decker (1990). The mean has been subtracted 
from each of the time series to center the series. 
The patterns in the three series are superfi- 
cially similar: the ACF declines more or less 
rapidly, and the PACF has a spike at lag 1 for 
all three series. This appears to be the only 
spike for the Ohio data, while the data for Illi- 
nois suggest a possible second spike, and the 
data for the PACF for Wisconsin trail off more 
slowly. Underlying these patterns are three dif- 
ferent models for the process generating the 
three series. 

Table 34.2 presents the same ARIMA mod- 
els for homicides in Ohio, Wisconsin, and II1i- 
nois as were presented for the IBM stock price 
series. Once again, the ARIMA(1,0,0) model is 
presented to illustrate the need for an I(1) com- 
ponent to the model. In all three ARIMA(1,0,0) 
models, the ¢ coefficient is large, and, based 
on the augmented Dickey-Fuller tests, not sig- 
nificantly different from one. For Ohio in 
Table 34.1, the ARIMA(0,1,0) model is clearly 
the best choice of the models presented. In both 
the ARIMA(0,1,1) and the ARIMA(1,1,0) mod- 
els the coefficients (6 and ¢) do not attain sta- 
tistical significance, and although the AIC is 
lowest for the ARIMA(0,1,1) model, the dif- 
ference from the ARIMA(0,1,0) model is not 
statistically significant (p = .640). The BIC is 
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Figure 34.4 Correlogram for Ohio homicide data 
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Figure 34.5 Correlogram for Wisconsin homicide data 
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- corrgram ilhomic 


0) 1-1 0) 1 
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor] 
1 0.8567 0.9150 37.48 0.0000 2 fererr= 0 J Re 
2 0.7778 0.2688 69.042 0.0000 —— — |=------ -- 
3 0.6828 -0.1405 93.909 0.0000 — — — |-~---- - 
4 0.6232 0.0595 115.09 0.0000 =S5= 
5 0.5526 0.0612 132.14 0.0000 ---- 
6 0.4723 -0.0849 144.88 0.0000 --- 
i 0.3730 0.0139 153'..03 0.0000 -- 
8 0.2610 -0.1976 LST LL 0.0000 -- a 
9 0.1714 -0.2477 158.92 0.0000 - - 
10 0.0671 -0.0837 159.2 0.0000 
11 -0.0152 -0.0280 159.22 0.0000 
12 -0.0693 0.0617 159.54 0.0000 
13 -0.0611 0.4657 150219 0.0000 e== 
14 -0.0835 0.2072 160.29 0.0000 - 
15 -0.1074 0.0351 161.13 0.0000 
16 -0.1177 0.0269 162.16 0.0000 
17 -0.1265 -0.0658 163.4 0.0000 - 
18 -0.1058 0.0651 164.3 0.0000 
19 -0.1128 -0.0542 165.35 0.0000 
20 -0.0976 0.0111 166.17 0.0000 
21 -0.1130 -0.3092 167.3 0.0000 -- 
22 -0.1045 0.2516 168.31 0.0000 == 
Figure 34.6 Correlogram for Illinois homicide data 
Table 34.2 ARIMA models for homicide rates 
Dependent variable: Statistics/ ARIMA ARIMA ARIMA ARIMA 
Homicide Coefficients (0,1,0) (1,0,0) (0,1,1) (G20) 
State 
Ohio AIC 127.3998 130.5507 127.1818 127.4642 
BIC 131.1001 136.1643 132.7322 133.0147 
DF 2 3 3 3 
ob, (p) - .877 (.000)* —  ~,199 (.262) 
8, (p) - - —.224 (.156) - 
constant (p) .006 (.966) 587 (.583) .008 (.949) .007 (.956) 
Wisconsin AIC 52.5839 54.1320 41.8033 48.7427 
BIC 56.2842 59.7456 47.3537 54.2932 
DF 2 3 3 3 
, (p) —  .859 (.000)* — —.345 (.011)* 
8, (p) — - —.597 (.000)* - 
constant (p) .038 (.518) .068 (.858) .042 (.051) .041 (.340) 
Illinois AIC 142.9423 147.2737 141.5559 140.7052 
BIC 146.6426 152.8873 147.1064 146.2556 
DF 2 3 3 3 
, (p) - .903 (.000)* —.292 (.034)* 
8, (p) - - —.245 (.073) - 
constant (p) .047 (.766) .808 (.666 053 (.658) .051 (.684) 


*p < .050 
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LAG AC PAC Q Prob>Q 
Eb -0.2027 -0.2027 2.0569 0.1515 
2 -0.0590 -0.1091 2.2353 0.3270 
3 O.d ii) 0.0854 2.8819 0.4102 
4 -0.1234 -0.1026 3.6976 0.4485 
to) 0.2409 0.2438 6.8792 0.2298 
6 -0.1238 -0.0620 7.7405 0.2577 
7 0.0875 0.1460 8.1812 0.3169 
8 -0.0074 -0.0426 8.1844 0.4157 
9 -0.1297 -0.0860 9.2044 0.4186 
10 0.2578 0.1904 13.34 0.2053 
dl. -0.2066 -0.1412 16,071 0.1385 
12 =0,1836 =-0,3199 18.29 0.1072 
13 0.1579 0.1497 19.979 0.0957 
14 0.0117 OvtO1o 19.988 0.1305 
15 0.0835 0.0591 20.49 0.1539 
16 =O0.1119 =0,.1215 21.419 0.1629 
17 -0.0154 -0.0029 21.438 0.2073 
18 0.1327 0.1066 22.836 0.1970 
19 -0.1527 -0.0430 24.754 0.1688 
20 0.1083 0.1698 25.754 0.1741 
21 -0.0254 -0.0874 25.812 0.2137 


Figure 34.7 Correlogram for Ohio homicide data 


lowest for the ARIMA(0,1,0) model, suggest- 
ing that in this instance the AIC may lead us 
to overparameterize the model. For Wiscon- 
sin, however, the random walk with drift rep- 
resented by the ARIMA(0,1,1) model appears 
to provide the best fit. It has the lowest AIC 
and BIC by some considerable margin [and the 
AIC difference with the ARIMA(0,1,0) model 
is statistically significant, p = .001], and the 
® coefficient is statistically significant. Sepa- 
rate testing of ARIMA(1,1,1) and models with 
higher order autoregressive or moving average 
components produced statistically nonsignifi- 
cant coefficients and no improvement in model 
fit. A third model, this time an ARIMA(1,1,0), 
appears to provide the best fit for the homi- 
cide data from Illinois. The ¢ coefficient is 
statistically significant in the ARIMA(1,1,0) 
model [the @ coefficient in the ARIMA(0,1,1) 
model is not], and both the AIC and the BIC 
are smallest for this model. Again, testing 
more complex models produced nonsignificant 


coefficients and no improvement in fit. For 
all three models, residuals were computed 
and their ACFs examined; all autocorrelations 
were nonsignificant, indicating that the resid- 
uals appear to be white noise. This is illus- 
trated in Figure 34.7 for the residuals from 
the time series analysis of the Ohio data; 
the residuals for the Wisconsin and Illinois 
data follow a similar pattern of nonsignificant 
autocorrelations. 


8 Extensions to the simple 
univariate model 


The univariate ARIMA model described above 
can be extended in several ways, including 
(a) the incorporation of seasonality and cyclic- 
ity into the models; (b) modeling interventions 
in time series natural experiments or quasi- 
experimental designs; and (c) addition of time- 
varying covariates or predictors to the model. 
Cyclicity refers to recurrent patterns that repeat 
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over time, with evenly spaced “peaks” and 
“valleys” in the series; an example would be 
the 22-year sunspot cycle (including changes in 
polarity). Seasonality is a more specific form of 
cyclicity, characteristic of time series such as 
retail sales and airplane passenger miles, which 
tend to be higher in some months and lower 
than others, over a 12-month cycle. Cyclic- 
ity in general, and seasonality more specifi- 
cally, show up as regularly-spaced peaks and 
valleys in the ACFs and PACFs. They are 
modeled by adding a seasonal component to 
the ARIMA model. The standard notation for 
the seasonal AR, I, and MA components of the 
model is (P,D,Q), with capital letters for the 
seasonal components as opposed to lower case 
letters for the nonseasonal components. The 
seasonal ARIMA(p,d,q)(P,D,Q) model includes 
autoregressive, integrated, and moving average 
parameters for both the seasonal and the non- 
seasonal components. For example, a seasonal 
ARIMA(0,1,0)(0,1,0),,, where the subscript 12 
refers to the periodicity of the cyclical or sea- 
sonal effect (in this instance, for a seasonal 
model, 12 months) could be written as (z, — 
Zi1) — (Z-a2 — Zt-13) = (Zt — 212) — Zea — 2-13) = 
Zp — Zt — Zp + Zt_13 = a. This involves differ- 
encing for both the first order integrated I(d) = 
I(1) effects (z,—z,_,) and the parallel (z,_,,— 
Z,_13), and also the seasonal I(D) = I(1) effects 
(z, — Z_,2) and the parallel (z,_, — z,_,,). Autore- 
gressive and moving average effects are simi- 
larly incorporated into the model. As a practical 
matter, cyclic or seasonal models can be incor- 
porated into the ARIMA model and estimated 
using the same software as for a nonseasonal 
ARIMA model. As such, they represent an 
extension to the ARIMA model that is method- 
ologically relatively easy to incorporate, but is 
significant in its substantive importance. An 
alternative to ARIMA modeling of cyclic pat- 
terns is spectral analysis, as described in detail 
by Jenkins and Watts (1968) and more briefly 
by, e.g., Wei (2005); see also Wei, Chapter 36, 
in this volume. 


8.1 Intervention analysis 


In quasi-experimental research (e.g., Shadish 
et al., 2002), the simple time series design is rep- 
resented as a series of measurements of the out- 
come, split into two segments by the insertion 
of an intervention or treatment; e.g., as illus- 
trated below, for a time series of 12 total obser- 
vations, half occurring before and half after the 
treatment or intervention, we can express the 
series as 


0,0,0,0,0;0, xX 0,0;,0,0,,0,,0,, 


where O represents an observation, the 
subscript on O indexes the time at which 
the observation was taken, and X indicates 
the treatment or intervention, here occurring 
between the sixth and seventh observations. 
Both the intervention and, separately, the 
effects of the intervention, may be either dis- 
crete, occurring at a specific time and not 
before or after that time, or it may be persis- 
tent, absent prior to some specific time and 
present after that time. An example of a discrete 
intervention, occurring between two observa- 
tions and at no other time, might be a one- 
time increase in funding to provide equipment 
to a law enforcement agency or a short-term 
therapy regimen. An example of a persistent 
intervention might be a change in policy, for 
example presumptive arrest in domestic vio- 
lence cases, or an ongoing, perhaps permanent 
treatment regimen, for example medication to 
suppress the effects of human immunodefi- 
ciency virus (HIV). Regardless of whether the 
treatment itself is discrete or persistent, the 
effects of the treatment may be either discrete 
or persistent, and if persistent, persistent over 
a shorter or a longer span of time, possibly 
decaying more or less rapidly over time. Further 
details on intervention models may be found 
in Box and Jenkins (1970; see also the more 
recent edition by Box et al., 1994), McCleary 
and Hay (1980), Wei (2006), and Yaffee and 
McGee (2000). In the present volume, further 
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discussion of time series intervention models is 
deferred to Chapter 36 by Sanders and Ward. 


8.2 Multivariate time series analysis 


Intervention analysis focuses on one-time dis- 
crete or persistent effects of interventions that 
effectively split a time series into “before” 
and “after” segments. Other influences on time 
series cannot be represented so simply, and 
may themselves be time series varying over 
time in much the same way as the outcome. 
Bivariate and multivariate time series mod- 
els are available to explain an outcome time 
series in terms of one or more input time 
series, and to assess whether it is more plausi- 
ble that the purported outcome time series is, 
in fact, dependent upon the purported input 
time series, or whether the evidence offers bet- 
ter support for the purported outcome’s being 
predictive of the purported input. There are 
several approaches to the analysis of time 
series involving several variables, either with 
or without an a priori definition of which 
variables are outcomes or predictors, several 
of which (autoregressive maximum likelihood 
models, lagged endogenous variable models, 
Box-Jenkins ARIMA models) are described and 
compared in Chapter 36 by Sanders and Ward 
in this volume. Here, therefore, only a brief 
overview is provided. There are other models, 
particularly as used in econometrics, which can 
be applied to the analysis of time series; for fur- 
ther details on distributed lag models, Kalman 
filter and state space models, and related top- 
ics, see, e.g., Amemiya (1985) or Johnston and 
DiNardo (1997); for a briefer treatment of time 
series regression techniques, see Ostrom (1990). 


8.3 Autoregressive error models 


As described in, e.g., Ostrom (1990; see also 
Sanders and Ward, Chapter 36, in this vol- 
ume), the use of ordinary least squares (OLS) 


multiple regression analysis for multivariate 
time series analysis has the serious disadvan- 
tage that the errors tend not to be indepen- 
dent, but are instead themselves autocorrelated. 
This results in underestimation of standard 
errors, overestimation of explained variance, 
and increased risk of Type I (falsely rejecting 
the null hypothesis) error. One approach to this 
problem is to explicitly model the autoregres- 
sive nature of the errors. This may be done 
by assuming an AR(1) or higher order AR(p) 
model and estimating the model using maxi- 
mum likelihood or other estimation techniques. 
Statistical software for estimating the AR(p) 
model in the autoregressive error model con- 
text and the ARIMA model context should pro- 
duce the same results if the same estimation 
method (e.g., the same maximum likelihood 
estimation algorithm) is used. The autoregres- 
sive error model may be extended to also 
model heteroscedasticity in the error variance, 
using autoregressive conditional heteroscedas- 
ticity (ARCH) or generalized autoregressive 
conditional heteroscedasticity (GARCH) mod- 
els (for introductory treatments of which see, 
e.g., Wei, 2006; Yaffee and McGee, 2000). In 
the ARCH model, assuming a contemporaneous 
effect of X on Z (for a lagged effect, the sub- 
script t would be replaced by t—1 for the pre- 
dictors x, 4) 2 = Bo +BiX1 +B2X2i ++ +++ Bx + 
e,, where e, is assumed to be normally dis- 
tributed with mean zero and variance h, = a, + 
Oy (G1)? + Op (C2)? + + Op (C_p)” in which 
case the residual variance (and by implication 
the error variance) is not constant but (a) varies 
over time and (b) depends on prior values of the 
time-specific residuals. This variation is explic- 
itly modeled in the ARCH model. In the GARCH 
model, h, depends not only on past values of 
e, but also on past values of h,. The differ- 
ence of the simple autoregressive error model 
from the ARCH and GARCH models lies in the 
explicit modeling of not only the value but also 
the variance of e, in the ARCH and GARCH 
models. 
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8.4 Granger causality 


An approach that tests for both causal direction 
and strength of causal influence was proposed 
by Granger (1969). For two variables Z, and Y,, 
both of which can be expressed as stationary 
time series with zero means, 


m1 m2 
%Z=)0 a2, + oD AVi-j + 


jai j=1 


and 
m3 m4 
Y= LG zyt+ QD dyj +h, 
j=l j=l 


where e, and f, are taken to be two uncorre- 
lated white noise series, and m is greater than 
zero but less than the length of the time series. 
According to the criterion of Granger causality, 
Y causes Z if some b, is statistically signifi- 
cantly different from zero. Correspondingly, Z 
causes Y if some ¢; is statistically significantly 
different from zero. In effect, the question posed 
by the test for Granger causality becomes, “Is 
there variation in one variable which cannot 
be explained by past values of that variable, 
but can be explained by past values of another 
variable?” If the answer is yes, then the sec- 
ond variable “Granger-causes” the first. Instan- 
taneous effects (e.g., from x, to y,) are excluded 
from the model. One issue here is the choice 
of m1, m2, m3, and m4, the number of lags to 
include in each part of the model. In general, 
the more prior values of the endogenous vari- 
able are in the equation, the greater is the like- 
lihood of rejecting the hypothesis of Granger 
causality, but the inclusion of additional values 
of the endogenous variable may have no signif- 
icant effect beyond some number. This number 
may be estimated by modeling the endogenous 
variable as an autoregressive time series, or 
by calculating separate ordinary least-squares 
regression models and examining the change 
in the explained variance (R’?) produced by 
the inclusion of each additional lagged endoge- 
nous variable (e.g., by the addition of y,_,). 


If there is no statistically significant change in 
the explained variance, there would seem to be 
little point in including this term in the equa- 
tion. If only a single lag is used for each out- 
come (m1 =1 and m4 = 1), we have for both 
X and Y the lagged endogenous variable (LEV) 
OLS model described by Sanders and Ward in 
Chapter 36. Granger causality may be exam- 
ined in its own right, or as part of the analysis 
of dynamic regression/linear transfer function 
and vector autoregression models, as described 
below. 


8.5 Transfer function, dynamic 
regression/linear transfer function, 
and vector autoregression models 


Transfer function models are typically explicit 
in identifying which variable is an outcome or 
effect and which is a predictor or cause. In a 
transfer function model, whether X causes Y or 
Y causes X may be based on a priori knowl- 
edge of the process, but can also be assessed 
based on the cross-correlation function between 
X and Y, which examines correlations of y, 
with x,_;,X;_2,.--+5Xt-m3 and the correlations of 
xX, with yy_1,yi_25--->»Yt-m2 (here using m2 and 
m3 to indicate the lags examined, the same as 
in the Granger causality equations above). In 
examining the cross-correlation functions, it is 
assumed that the respective series are, or have 
been rendered, stationary. Once the model has 
been identified, based on the cross-correlation 
function, the impact of X on Y (or Y on X) 
is estimated using maximum likelihood tech- 
niques, as in estimation of simple univariate 
ARIMA models. In order to calculate the impact 
of the predictor on the outcome, it is neces- 
sary to prewhiten the series by (a) using infor- 
mation about the ARIMA process generating 
the input series to transform the input series 
to a white noise process, then (b) applying 
the same set of transformations to the outcome 
series, before (c) calculating the impact of the 
input series on the outcome series. For exam- 
ple, if the input series can be represented as 
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X, =a, +x,_, +6,(x,_, —X,_,) [an ARIMA(1,1,0) 
process], we calculate a, = x,—x,_, — 6, (x,_, — 
X;_2) and apply the same transformation to Y 
to obtain the output series, now designated 
Z: Ze = Vi - Vir — b1(Vi-1 — Yt-2). Prewhitening 
removes any spurious correlation based on 
trend between X and Z (the transformed series 
Y). Prewhitening is relatively easily achieved 
for bivariate transfer function models but, for 
models with two or more inputs, it can be more 
complicated. 

In an alternative approach called dynamic 
regression (DR) or linear transfer function (LTF) 
modeling (see, e.g., Yaffee and McGee, 2000), 
the argument is made that there is no need 
for prewhitening, as long as the input series 
are not highly correlated. The approach here 
is similar to causal modeling using OLS multi- 
ple regression, with the outcome and predictor 
variables identified based on theory rather than 
empirically, using a cross-correlation function, 
although a Granger causality test provides use- 
ful insight into the identification of which vari- 
ables are predictors and which are outcomes. In 
the DR/LTF approach, the input and outcome 
series are rendered stationary, the error term is 
modeled as an autoregressive process (or possi- 
bly an integrated process if , is approximately 
equal to 1), and the outcome series Z is modeled 
as a function of one or more lagged values of 
predictor series X,,X,,...,X,. The process for 
modeling DR/LTF models does require a series 
of steps to make sure that what is modeled is not 
spurious correlation or random variation, but 
these steps are easily performed in the context 
of the basic ARIMA modeling approach. 

Vector autoregressive (VAR) models (e.g., 
Brandt and Williams, 2007; Stock and Watson, 
2001; Wei, 2006) take a more exploratory 
approach to modeling multivariate time series. 
The emphasis here, as described by Brandt and 
Wilson, is on avoiding restrictive and quite 
possibly false assumptions made in alternative 
methods (ARIMA transfer function modeling, 
structural equation modeling, autoregressive 


error models), and instead letting the data 
determine the structure of the model as much 
as possible. In a VAR model, all of the variables 
are treated as endogenous, subject to influence 
by all of the other variables in the model. VAR 
models allow for the existence of simultane- 
ous and mutual influence among the variables 
in the model, but the modeling strategy itself 
allows analysis using OLS multiple regression 
calculating separate equations for each of the 
variables in the model. In a two variable model, 
the equations 


Z=MmY+ =P1p2Zt-p + V1 pYt-p +Uy 4 


and 

Ve = NZ + UP2,pVt_-p + VV2,pZr—p + Use 
can be reparameterized as 

Zp = Oy + Uy Zp + VBi pVt-p + &14 


and 


Vt = Qe Yb. 5Vt-p 1 ~BopZt-p er E2t 


where and 8 can be expressed in terms of 
the original parameters (and the a coefficients 
are just constants added to the equation, con- 
sistent with the usual practice in OLS regres- 
sion). The reparameterized equations can be 
estimated using OLS regression, because they 
are not expressed in terms of simultaneous 
effects of one endogenous variable on another 
(hence simultaneity bias in the original but not 
the reparameterized equations). 

As described in Brandt and Williams (2007), 
the purposes of VAR models are assessing 
causal relationships, for which purpose Granger 
causality analysis is performed as part of the 
analysis; assessing dynamic impacts, which is 
done by inverting the VAR equations to find 
the moving average representation of the model; 
and determining how much of the variation of 
each variable is attributable to dynamic changes 
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in other variables, with particular interest in 
how much of an impact the contemporaneous 
influences have compared to the lagged influ- 
ences. In VAR models, in contrast to both trans- 
fer function and DR/LTF models, differencing 
is discouraged, because it obscures information 
of interest about trends in the model. Also in 
contrast to transfer function models, prewhiten- 
ing is not appropriate in VAR models because 
it can alter the dynamics of the model and 
change the assessment of Granger causality. The 
principal advantages of VAR lie in its ease of 
computation, its avoidance (compared to other 
multivariate time series methods) of potentially 
false assumptions used to identify the struc- 
ture of the model, and its ability to investigate 
a potentially broad range of possible specifica- 
tions of the dynamic relationships in the data. 
Criticisms have been that it cannot truly assess 
causal relationships, that it is atheoretical, and 
that VAR models tend to be overparameterized 
in their attempt to avoid simplifying assump- 
tions that might result in more parsimonious 
models. 


9 Forecasting 


Thinking in the context of longitudinal 
research, with an implicit interest in descrip- 
tion and explanation, possibly including the- 
ory testing, time series analysis is one of many 
tools, having utility for addressing some spe- 
cific problems that may arise in longitudinal 
research. This is not, however, entirely consis- 
tent with the context in which time series anal- 
ysis techniques were developed. Time series 
analysis, perhaps more than other techniques of 
analysis used in longitudinal research, is pre- 
eminently an applied technique. The underly- 
ing reason for the interest in describing time 
series processes is to predict future values of the 
outcome variable, in order to control the out- 
come by controlling inputs. Forecasting future 
values of the time series is a relatively minor 
topic in the context of longitudinal research, 
where greater emphasis is placed on explaining 


past outcomes than on accurately predicting, 
and intervening to control, future outcomes. 
In the context of time series analysis, how- 
ever, forecasting and control (the subtitle of Box 
and Jenkins, 1976) may be the whole point of 
time series analysis. A similar applied orien- 
tation is characteristic of intervention analysis 
using time series techniques; the whole point 
is to determine how much of an impact over 
what span of time we can expect as a result of 
an intervention. In keeping with the longitudi- 
nal research focus of this volume, forecasting 
is treated only briefly here, but greater detail 
is available in most (particularly book-length) 
treatments of time series analysis. 

One approach to forecasting derives from the 
curve fitting approach described at the begin- 
ning of this chapter. One simply uses the for- 
mula for the fitted curve, with future time 
points as input, to forecast the future value 
of the series. The danger in this, of course, 
is that the curve which has been fitted to the 
observed data may to some extent be capitaliz- 
ing on random variation in the data, and pre- 
dicted values may quickly diverge from future 
values of the outcome that have not been used 
to model the data. Consider once again the pat- 
tern in Figure 34.2. At the very end of the series, 
the predicted value is increasing, and because 
the curve at the end is quickly dominated by the 
cubic and quartic terms in the polynomial equa- 
tion, the curve will continue to increase rapidly; 
but the right tail of the observed data actually 
appears to have a downward trend. Other meth- 
ods of forecasting such as exponential smooth- 
ing (see, e.g., Yaffee and McGee, 2000) include 
the use of long (e.g, 10 time points) moving aver- 
ages to smooth the curve, coupled with one-step 
forecasts that take the existing moving average 
at time t, forecast the next value of the outcome 
at time t+1, calculate a new moving average 
of the same length that includes the time t+1 
forecast as the last data point in the moving 
average, then use that moving average (includ- 
ing the forecast data point) to forecast the next 
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(t+ 2) value of the series. This one-step-ahead 
approach can include adjustments for trends in 
the data to improve the forecasts, and is an alter- 
native to forecasting based solely on a deter- 
ministic model for trend. 

As indicated in the ARIMA analysis of the 
data in Figure 34.2, the trend is stochastic 
drift, the result of random differences between 
adjacent values of the outcome plus the lin- 
gering effect of a past random shock, and in 
this sense, the curve fitted to the data really is 
capitalizing on random variation. Similarly, the 
moving average plus exponential smoothing 
approach has limited information upon which 
to base its forecast. Using the ARIMA model to 
forecast the series also holds little promise for 
improvement. Because the process is driven 
by random shocks whose expected value is 
E(a,) = 0, the “best” forecast in this instance 
may simply be to forecast that the value of 
the next point in the series will be equal to the 
previous value of the series. In short, when the 
process involves only random shocks, there is 
little real information upon which to base a 
forecast other than the most recent value of the 
series. When the values of the series depend 
on more than one past value of the series, 
more accurate forecasting may be possible. To 
the extent that there are deterministic trends, 
seasonality, and cyclicity in the data, this 
information can also be used to produce more 
accurate forecasts. Different methods can also 
be combined to improve forecasts. 

One way to assess the accuracy of a forecast- 
ing model is to split the data into two sets, a 
longer series of earlier observations on which 
the forecast model is developed, and a shorter 
series of later observations on which the model 
can be tested. Based on a review (Yaffee and 
McGee, 2000) of comparative studies of fore- 
casting accuracy, including simulation results, 
ARIMA models appear to do well for short-term 
forecasting, while simple regression techniques 
may outperform other methods for medium- 
and long-term forecasts. This is because in the 


short term, random variation (which is mod- 
eled in the Box-Jenkins ARIMA approach) may 
have a substantial impact, as may seasonality, 
but in the longer term, random variation and 
seasonality tend to be dominated by the other 
systematic components of the process, partic- 
ularly deterministic trends; it is the processes 
that are primarily driven by random shocks that 
are most difficult to forecast accurately. Also as 
noted by Yaffee and McGee (2000), combined 
methods, particularly when they include the 
Box-Jenkins ARIMA approach, generally out- 
perform single-method forecasting approaches. 
Unsurprisingly, the farther in the future the 
forecast is made, the less accurate it will be 
with any methods, and the more information 
(as opposed to random variation) is available 
on the process, the more accurate the forecasts 
will be. 


10 Conclusion 


Time series analysis may be used to answer any 
of the following questions. 

(1) What is the functional relationship of an 
outcome variable Z with time? Here the answer 
is an equation with Z as the dependent variable, 
or some function of Z as the dependent variable, 
where the function of Z may involve subtraction 
of the mean to center the series at zero, differ- 
encing to remove linear or higher order trends, 
taking the natural logarithm (sometimes done 
to stabilize the variance over the length of the 
series), or other transformations. The predictors 
are functions of time, for example expressing 
Z as a polynomial function of time, z, =A + 
Ayt+Ajt? +++» +Ayt; or an exponential func- 
tion, z, =A, +A,e'; or some other function of 
time. This question may best be answered by 
graphical representation and curve fitting tech- 
niques. The objective may or may not be to 
describe the process, but it is to describe the 
pattern. 

(2) How does the current value of an out- 
come variable Z depend on past values of Z? 
Here the answer is an equation with Z (or 
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some function of Z) as the dependent variable 
and one or more lagged values of Z as predic- 
tors, possibly with an assumed coefficient of 
1, as in the ARIMA(0.1,0) random walk or the 
ARIMA(0,1,1) random walk with drift models; or 
with estimated ¢ coefficients where z, = 6,2Z,_,+ 
22%. +--+ + ),%_p3 or some combination of 
the two. This question may best be answered 
by an ARIMA(p,d,q) model with d+p > 0. 

(3) How do past random shocks affect the 
current value of an outcome Z? Here Z is 
expressed as a function of the current random 
shock a, and one or more past random shocks 
Ay_1,@_2,--+,@_q such that z, = 0 +a,, +0,a,_, + 
0,4, +--+ 6,a,_4, and for a series centered on 
its mean, 0) will be equal to zero. This ques- 
tion may best be answered by an ARIMA(p,d,q) 
model with q > 0. 

(4) How do past (and possibly contempo- 
raneous) values of one or more time-varying 
predictors Y,, affect the value of the out- 
come variable Z? Here, Y,, may be a one-time 
intervention, coded zero for all but the one 
time in which the intervention occurs and one 
for the single point in time of the interven- 
tion, for an acute, nonenduring intervention; 
or it may be coded zero for the time prior to 
the intervention and one for the time subse- 
quent to the intervention, for an intervention 
with enduring effects; or it may have some 
other coding, corresponding to the hypothe- 
sized duration of the effects; or the duration 
itself may be an unknown to be modeled. Alter- 
natively, or in the same analysis, X,;, may be 
another time series that influences the out- 
come. Alternatives for answering this question 
include autocorrelated error models, autore- 
gression with correction for heteroscedasticity 
(ARCH) and generalized autoregression with 
correction for heteroscedasticity (GARCH) mod- 
els, multivariate lagged endogenous variable 
models, transfer function models, dynamic 
regression (DR)/linear transfer function (LTF) 
models, and vector autoregression (VAR) 
models. 


(5) We may want to ask all of these questions 
at the same time; the result would be a com- 
plex model, V°z, =a,+0)+ > 04a -g+d %pZ—pt 
> Ymfm(t) + 2B. Vet, In principle, it should be 
possible to estimate such a model. In prac- 
tice, trends would not be represented by both 
differencing and some function of time, and 
either the analysis would proceed on the resid- 
uals of a deterministic trend model represented 
by +) Y¥mfm(t), in which case the explicit time 
component >> y,,f,,(t) would be subtracted from 
z, and the integrated component V¢ would be 
excluded from the model; or the model would 
drop the explicit time component )°y,,f,,(t) 
and retain the integrated component V‘; leav- 
ing either the model with the residualized 
deterministic time trend [z, — >> y,f,,(t)] = a;+ 
Oo + 2 Oya_-g +L OpZ_p +L BrYxt, or the model 
Viz, = a, +65 + paca ale Lop Z-p + UBY«t: 
Once again, for a time series centered on its 
mean, 8) would drop out of the model. Because 
it readily models all of these different com- 
ponents, the ARIMA model, possibly incorpo- 
rating the DR/LTF approach for time-varying 
covariates, provides perhaps the most flexible 
approach to time series modeling, as long as 
there are sufficient data (typically 50 or more 
time-specific observations) to support the use of 
the model. For shorter series, other approaches 
may be preferable, and for large numbers of 
cases, alternative models such as latent or mul- 
tilevel growth curve models are better suited 
to model the relationships of predictors to out- 
come variables. 


Software 


Time series analysis can be performed with 
specialized software such as RATS, or with 
modules in existing general-purpose statistical 
packages such as SAS, SPSS, Stata, and SYS- 
TAT. For time series curve fitting, the SPSS 
CURVEFIT routine is particularly useful, while 
Stata has user-friendly options for ARCH and 
GARCH models. For the time series analysis 
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performed here, SPSS and Stata statistical rou- 
tines were used. 


Bibliographic note 


1Several time series texts and associated statis- 
tical software manuals have been used in the 
preparation of this chapter. Box and Jenkins 
(1970) is the classic on ARIMA time series anal- 
ysis, and Box et al. (1994) is a more recent 
edition of this classic. The parallel classic for 
spectral analysis is Jenkins and Watts (1968). 
An out-of-print and slightly out-of-date but 
well written introductory treatment of ARIMA 
time series analysis can be found in McCleary 
and Hay (1980). A more up-to-date introduc- 
tion from a social science perspective is Yaffee 
and McGee (2000), which includes detailed 
instruction on the use of SAS and SPSS for 
time series analysis. A brief introduction with 
an emphasis on the comparison among SAS, 
SPSS, and SYSTAT time series analysis rou- 
tines for ARIMA time series models is offered 
by Tabachnick and Fidell (2007) on their web- 
site, www.ablongman.com/tabachnick5e. Wei 
(2006) offers a more detailed and more 
advanced treatment of time series analysis, 
including spectral analysis. 
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Chapter 35 i 


Spectral analysis 
William W.S. Wei 


Spectral analysis is a statistical method used 
to analyze a time series dataset identifying sta- 
tistically important frequencies present in a 
time series to see whether it contains periodic 
or cyclical components. After reviewing some 
basic time series concepts, we introduce peri- 
odogram analysis that is a useful technique to 
search for hidden periodicities. We then intro- 
duce the spectrum of a time series, a sample 
spectrum and its smoothing. In the process, 
we introduce some commonly used spectral 
and lag windows and a procedure to obtain 
confidence intervals for the underlying spec- 
tral ordinates. We also discuss cross-spectrum 
that can be used to analyze the relationships 
between frequency components in two time 
series. Empirical examples are used to illustrate 
the concepts and procedures. The chapter ends 
with some mathematical justifications that have 
been used throughout the chapter. 


1 Introduction 


A time series is often referred to as an ordered 
sequence of observations. The ordering is usu- 
ally through time, especially in terms of some 
equally spaced time intervals. For example, 
we observe daily calls to directory assistance, 
monthly international airline passengers, quar- 
terly unemployment numbers, annual imports 
and exports, and various mortality rates and 


crime rates. The body of statistical methodol- 
ogy available for studying time series is referred 
to as time series analysis. More formally, a 
time series is a realization of a time series pro- 
cess that is a family of time indexed random 
variables, Z,, where t belongs to an index set. 
For most of our discussion, we assume that 
the index set is the set of all integers. The 
fundamental goal of time series analysis is to 
investigate the underlying process through an 
observed time series or realization. Thus, it 
is important to understand some fundamental 
characteristics of a time series process. 

A time series process is said to be strictly sta- 
tionary if the joint distribution of (Z,,,- = Z1,) 
is the same as the joint distribution of 
(A, 440+>+s4e44) for any n-tuple (¢;,....%,) and 
k of integers. The terms strongly stationary and 
completely stationary are also used to denote a 
strictly stationary process. Since strict station- 
arity is in terms of its distribution function, 
and it is very difficult or impossible to verify a 
general distribution function, for a given time 
series process Z,, t = 0,+1,+2,..., we often 
concentrate our study on some of its impor- 
tant parameters such as moments including the 
mean function of the process 


M, = E(Z,) (1.1) 
the variance function of the process 


of = Var4)=E4.=— py (1.2) 
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the autocovariance function between Z,, and Z,, 


y(t), t) = E(Z,, = Mi, )(Z, — Mr) (1.3) 


and the autocorrelation function between Z,, 
and Z,, 


y(t, ty) 

For a strictly stationary process, since the 
distribution function is the same for all t, 
the mean function, », = pw, is a constant, pro- 
vided E(|Z,|) < oo. Similarly, if E(Z?) <0, 
then the variance function, 07 = o’, is con- 
stant, and the autocovariance and _ auto- 
correlation functions depend only on the 
time difference, ie., y(t,t+k) = E(Z, — p) 
(Zi.4—M) = y, and p(t, t+ k) = ¥;./¥o = px, where 
we note that y, = Var(Z,) = 0°. 

A time series process is said to be nth order 
weakly stationary if all its joint moments up to 
order n exist and are time invariant, i.e., inde- 
pendent of time origin. Therefore, a second- 
order weakly stationary process will have a 
constant mean and variance, with the auto- 
covariance and the autocorrelation functions 
being functions of the time difference alone. 
The terms weakly stationary or stationary in 
the wide sense or covariance stationary are also 
used to describe a second-order weakly station- 
ary process. 

It is noted that process and model are some- 
times interchangeably used. Hence a time series 
model is also used to refer to a time series pro- 
cess. The simplest time series process or model 
is a white noise process e, that is a sequence of 
uncorrelated random variables from a fixed dis- 
tribution with constant mean E(e,) = ,, usually 
assumed to be 0, constant variance Var(e,) = a? 
and y, = Cov(é,,é,,,) =0 for all k £0. 

A time series process is said to be a normal or 
Gaussian process if its joint distribution is nor- 
mal. Because a normal distribution is uniquely 
characterized by its first two moments, strictly 


p(t,,t,) = (1.4) 


stationary and weakly stationary are equivalent 
for a Gaussian process. Unless otherwise men- 
tioned, the time series processes that we discuss 
are assumed to be Gaussian. 


2 A simple periodic model 
and harmonic analysis 


Consider a simple zero mean time series Z, that 
exhibits a periodic or cyclical pattern with a 
fundamental period N, which is the smallest 
time period for this repetitive pattern to hold. 
For this periodic process, both the time series Z, 
and its autocovariance function y, will exhibit 
spikes at multiple lags of N, i.e., at times t = jN 
and lags k= jN for j = +1,+2,.... A natural 
representation of this periodic or cyclical phe- 
nomenon is the following simple sinusoidal 
model: 


Z, = w+acos(wt+)+ e, (2.1) 


where e, is a zero mean Gaussian white noise 
process, @ is the amplitude or height of the 
cycle, ¢ is the phase or location of cycle peak 
relative to time origin zero, and w= 27/N is the 
fundamental frequency corresponding to the 
given fundamental period N. More often the fol- 
lowing equivalent form that is more convenient 
for computing parameter estimates is used: 


Z, = 4+ acos(wt)+ bsin(wt) +e, (2.2) 
where 
a= Va?+b? (2.3) 
and 
¢@ =tan~'(—b/a) (2.4) 
Given the observations Z, for t=1,...,N, the 
least square estimates are given by 
1N 
p= ni 2% (2.5) 
9 N 
a= a >¢ Z,cos(wt) (2.6) 
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N 
a > Z, sin(at) (2.7) 


The amplitude and phase of the cycle can then 
be estimated from equations (2.3) and (2.4), 
respectively, using (2.6) and (2.7). 

The least square estimates from (2.5) to (2.7) 
follow from the trigonometric results shown in 
Section 7 that: 


N N 
>> cos(wt) = )*sin(wt) = 0 (2.8) 
; = 
> cos(wt) sin(wt) = 0 (2.9) 
N 
> cos(wt) cos(wt) = N/2 (2.10) 
and 
N 


>¢sin(wt) sin(wt) = N/2 


t=1 


(2.11) 


Using the well known result from regression 
analysis that 


N = 
4-2) 


N N 

=) (2-4) +O 4-2) (2.12) 
t=1 t=1 

ie., total sum of squares = sum of squares due 

to error + sum of squares due to regression, we 

obtain the sum of squares due to the cyclical or 

periodic component as 


N A 2 
> [a cos(wt)+b sin(wt)| 


t=1 


= N(a@+6)/2 (2.13) 
which follows from equations (2.9), (2.10), and 
(2.11). This is proportional to the squared 
amplitude of the fitted sinusoid, and it is 
the amount of variance accounted for by the 
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periodic component at frequency w. Corre- 
sponding to the model in (2.2), this sum of 
squares due to regression has two degrees of 
freedom, one each for a and b, and by compar- 
ing with the sum of squares due to error that 
has (N — 3) degrees of freedom, we can use the 
standard F test to assess the significance of this 
periodic component. The detecting procedure 
introduced above is often known as harmonic 
analysis. 


3 Periodogram analysis 


In most situations, the exact periods of cycles 
are not known, and we want to identify them 
using available time series Z, for t=1,...,N. 
A very natural approach is to extend the har- 
monic analysis of Section 2 at all frequencies 
2ak/N for k=0,1,...,N/2. That is, for a given 
time series of N observations, we consider the 
following representation: 


[N/2] 
Z,= D> (a, cos(w,t) +b, sin(w,t) (3.1) 
k=0 
where w, = 27k/N,k =0,1,...,[N/2], and [x] 
is the greatest integer less than or equal to x. 
Suppose that the trigonometric sine and cosine 
functions cos(w,t) and sin(@,t) are defined 
on a finite number of N points, ie., for 
t=1,2,...,N, the system 


{cos(w,t),sin(@,f):k=0,1,...,[N/2]} (3.2) 


contains exactly N nonzero functions, which 
follow from the sine function being identically 
zero for k =0 and k =[N/2] if N is even. More- 
over, as shown in Section 7, the system is a set 
of orthogonal functions, i.e., 


> cos(@,t) Cos(w;t) 


t=1 
N, k=j=0 or N/2 (N even) 
={N/2,k=j40 orN/2(Neven) (3.3) 
Ok#j 
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¥sin(w,6) sin(@,t) 
t=1 
0, k=j=0or N/2 (N even) 
= {N/2, k=j #0 or N/2 (N even) (3.4) 
OKA; 
and 
N 


> cos(w,t) sin(w;t) = 0 for all k and j. (3.5) 

t=1 
Thus, the system in (3.2) forms a basis, and 
equation (3.1) fits the N data points exactly. The 
representation is known as the Fourier series 
of Z,, and w, = 27k/N,k =0,1,...,[N/2], are 
called Fourier frequencies. Using the results in 
(3.3), (3.4), and (3.5), we immediately obtain the 
following: 


1 N 
nH ,cos(w,t), k=0 and 


= N/2 if N is even 
a, = (3.6) 


2 N 
> Z, cos(w,t) 
2 asl (N= 172) 


4 


k= 
2 N 
bp = => N D2, , sin(w,t) 


k=1,2,...,[(N—-1)/2] (3.7) 


which are known as Fourier series coefficients. 
They are, in fact, essentially the least squares 
estimates of the coefficients in fitting the fol- 
lowing regression model: 


[N/2] 
Z,= >< a, cos(w,t)+bysin(w,t)+e, (3.8) 
k=0 
Note that equation (2.2) is a special case where 
we consider the components only at frequen- 
cies w, and w,, and the component at w, gives 


N 
dg= > Z,/Nthatis actually the mean ofthe series. 


t=1 
The fitting will be perfect. Multiplying Z, on 
both sides of (3.1), summing from t=1tot=N, 
and using the relation (3.6) and (3.7), we have 


N ((N-1)/2 


Nota (ai +b} ) 
2 k=1 
Sz? if N is odd, a9) 
= 3.9 
t=1 ie 2 2 2 
Nay + > = > ”' (a2 + b2) + Nal, 
k=1 


if N is even. 


Hence, 
N [W-2/21 
[> y (a2 +b?) if N is odd 
y = 
Z,-Z)y= N (N-1)/2 (3.10) 
2 t ) | = : (a +b ) 
=1 
{ 


+ Nay, if N is even 


and the result is presented as the following 
analysis of variance in Table 35.1. 


Table 35.1 Analysis of variance table for 
periodogram analysis 


Source Degrees of Sum of squares 
freedom 

Frequency w, =0 1 Na 
(Mean) 

N 2 2 
Frequency 2 ey (aj +b?) 
w, =27/N 

N 2 2 
Frequency 2 af (a3 + b3) 
w, =47/N 

N 2 
Frequency 2 = ae 
®(w-1)/2] 
== + Bi 
=[(N-1)]27/N [(N-1)/2] 
Frequency 1 Nay» 
On 2 = 7 (exists 
only for even N) 

N 
Total N YZ; 

t=1 
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The quantity I(w,) defined by 
I(x) 
(Na k=0 
N 
ym + by) 
k=1,... [(N—1)/2] 


(3.11) 


Nay), k = N/2 when N is even 


is called the periodogram, and the procedure is 
known as periodogram analysis. It was intro- 
duced by Arthur F. Schuster (1898), who used 
the technique to disprove C. G. Knott’s claim of 
periodicity in earthquake occurrences. Schuster 
(1906) went on to apply the method to ana- 
lyzing annual sunspot activity and found the 
approximate eleven-year cycle of the sunspot 
series. 

Assume that Z, for t=1,..., and N are iid. 
N(0, a7). We have 


9 N 
E(a;,) = = >> E(Z,) cos(w,t) = 0 
t=1 
and 


4a N 
Var(a;,) = ae > a’ [cos(w,t)]? 


4a? N 40? N20? 
= —— t 2 = — — = — 
leos(O.)) =e oN 


where we use the fact that a, and a; are 
independent for k #4 j because of (3.3). 
Hence, the a, for k=1,2,...,[(N—1)/2], are 
iid. N(0,207/N), and the Naj/20” for k = 
1,2,...,[((N—1)/2], are iid. chi-squares 
with one degree of freedom. Similarly, the 
Nb?/207 for k =1,2,...,[(N—1)/2], are iid. 
chi-squares with one degree of freedom. 
Furthermore, Na;/20? and Nb;/20°* for all 
k andj =1,2,...,[(N—1)/2], are independent, 
because a, and b; are normal and by the 
orthogonal property of (3.2), we have 
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N N 
Cov(a;,b;) = ak Sz, cos(w,t), >) Z, on) 


t=1 u=1 
4 N 
= — IE(Z) cos(w,t)-sin(a;t)] 
N t=1 
2N 
= we & [cos(w,f)-sin(w;t)] 
=0 for any k and j (3.12) 
It follows that 
I(o N 
oe = 553 (i + bi (3.13) 


for k = 1,2,...,[N/2] are iid. chi-squares 
with two degrees of freedom, denoted as 
x?(2). Clearly, [(0)/o?] = Nai when k=0 
and [I(7)/o*] = Nax,. when k =[N/2] and N 
is even are each a chi-square distribution with 
one degree of freedom. With the adjustment for 
I(7r) in mind, we assume, without loss of gen- 
erality, that the length of the series N is odd in 
the remaining discussion. 

Clearly, if in Table 35.1 the only significant 
component sum of squares is at frequency @,, 
then model (3.1) reduces to model (2.2), and the 
sum of squares for all the remaining frequen- 
cies will be combined to become the sum of 
squares for error with (N — 3) degrees of free- 
dom. Hence, as indicated earlier in Section 2, to 
test the significance of the periodic component 
at w,, we can use the following test statistic 


[N(ay + by) /20°]/2 


[N/2] 
Na+ inne] /(N —3) 
j=2 


_ (N-3)N(a;i + by) 


[N/2] 
2) d) N(aj +b?) 
j=2 
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In other words, to test for the significance of 
the component at frequency w,, we use the test 
statistic 


[N(a; + bz) /2]/2 


[N72] 
x mej +2} r—2) 


j=ljtk 
(N —3)(a; + by) 


= (3.14) 
[N/2] 
2 ~ (a+ m9] 
j=1j#k 


which follows the F-distribution with 2 
and (N—3) degrees of freedom, denoted as 
F(2, N —3). Since the period P and frequency w 
are related by 

P=27/o (3.15) 
once a significant frequency w, is found, the 
period or the length of the cycle is found to 
be 27/w,. More generally, we can test whether 


a series contains multiple m periodic compo- 
nents by postulating the model 


Zp =M+D) (ag, Co8(w,,t) 


i=1 
+ b; sin(w;t) +e, (3.16) 
where the e, are iid. N(0,07), @;, = 27k;/N, 
and the set J = {k,;:i=1,...,m} is a subset 
of {k:k=1,...,[N/2]}. The corresponding test 
statistic will be 


_ (N-2m~1)[D2, (aj, +84] 


[N/2] 
2m > +09] 
I 


J=1j¢ 


F (3.17) 


which follows the F-distribution with 2m and 
(N —2m-—1) degrees of freedom, i.e., F(2m, 
N-—2m-—1). 


4 Tests for hidden periodic 
components 


In practice, even if we believe that a time series 
contains a periodic component, the underlying 
frequency is often unknown. For example, we 
might test the null hypothesis H,: a= B=0 
against the alternative H, : a0 or B40 in the 
model 


Z, = 4+ acos(wt)+ Bsin(wt)+e,, (4.1) 


where e,; is a Gaussian white noise N(0, a?) pro- 
cess, and the frequency w is unknown. Because 
the frequency is unknown, the F-distribution 
and the test statistics as discussed in Section 3 
are not directly applicable. The periodogram 
analysis, however, is still useful. In fact, the 
original purpose of the periodogram was to 
search for hidden periodicities. If the time 
series indeed contains a single periodic com- 
ponent at frequency w, it is hoped that the 
periodogram I(w,) at the Fourier frequency , 
closest to w will be the maximum. Thus, we 
can search out the maximum period ordinate 
and test whether this ordinate can be reasonably 
considered as the maximum in a random sam- 
ple of [N/2] i.i.d. random variables, each being 
a multiple of a chi-square distribution with two 
degrees of freedom. Thus, a natural test statistic 
will be 


I (way) = max {I(@,):k=1,...,[N/2]} (4.2) 


where @,) is used to denote the Fourier 
frequency with the maximum periodogram 
ordinate. 

In 1929, Ronald A. Fisher derived an exact 
test for I" (w,,)) based on the following statistic 


_ oe (1) 
poe I(@,) 


Under the null hypothesis of a N(0O,a*) white 
noise process for Z,, Fisher (1929) showed that 


(4.3) 


P(T> g)=S3(-1) (*) (1—je)k (4.4) 
j=l 
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where K = (N—1)/2 if N is odd and 
K =(N/2-1) if N is even, g> 0, and r is 
the largest integer less than 1/g. Thus, for any 
given significance level a, we can use equa- 
tion (4.4) to find the critical value g, such that 
P(T > g,) =a. We will reject the null hypothe- 
sis and conclude that the series contains a peri- 
odic component if the T value calculated from 
the series is larger than g,. This test is known 
as Fisher’s test. The critical values of T for the 
significance level of a = .05 as given by Fisher 
is shown in Table 35.2. As shown in the third 
column of Table 35.2, for most practical pur- 
poses, equation (4.4) can be approximated with 
the first term, i.e., 


P(T > g) ¥ K1—g)*"? (4.5) 


and this is useful for the case when K is not 
listed on the table. 

For the model in (4.1), a significant value 
of T leads to the rejection of the null hypoth- 
esis and implies that there exists a periodic 
component in the series at some frequency ow. 
This frequency, however, is not necessarily 
equal to w), because w,,) is chosen only 
from the Fourier frequencies and not from 
all possible frequencies between O and 7. 


Table 35.2 The critical values of Fisher’s test for 
the maximum periodogram ordinate at a= .05 


Ke Sa g, (by the first term only) 
5 .68377 .68377 
10 44495 44495 
15 .33462 .33463 
20 .27040 .27046 
25 .22805 .22813 
30 .19784 .19794 
35 .17513 .17525 
40 .15738 .15752 
45 .14310 14324 
50 .13135 .13149 


K=(N-—1)/2 if N is odd and K = (N/2—1) if N is even. 
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Herman O. Hartley (1949), however, has shown 
that the unknown w with the maximum peri- 
odic component can be safely estimated by @,,) 
since P (|w—w,)| > 27/N) <a, the significance 
level of the test. 

Let I(@.)) be the second largest peri- 
odogram ordinate at Fourier frequency @,). 
Peter Whittle (1952) suggested that we could 
extend Fisher’s test for the second largest ordi- 
nate based on the test statistic 


[® 
i= (oe) (4.6) 


I T(@,) — I (wy) 


where the distribution in (4.4) is taken as the 
distribution of T, with K being replaced by 
(K —1). The procedure can be continued until 
an insignificant result is reached. It leads to an 
estimate of m, the number of periodic compo- 
nents present in the series. 

Before going further into the topic, let us con- 
sider an example of using the method to ana- 
lyze a dataset. It should be noted, however, that 
from the above discussion the length of a time 
series determines the Fourier frequencies and 
the periods to be tested. It is important that 
the length of a time series used in periodogram 
analysis should be an integer multiple of the 
fundamental period or cycle length that we are 
trying to detect. Otherwise, an artifact called 
leakage may occur, where the variance of a real 
cyclical component that cannot be accurately 
detected spills over into the sum of squares 
of other frequencies that are detected by the 
periodogram analysis. For example, if the fun- 
damental period of a series is 12, there is no 
leakage if the length of 24 observations is used. 
If 23 observations are used, the periodogram 
analysis detects the components at frequencies 
27k/23,k=1,...,11, which no longer include 
a component of period 12. 

There are many statistical packages avail- 
able to perform the spectral analysis introduced 
in this chapter. They include SAS issued by 
SAS Institute (2003), S-Plus issued by Insightful 
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Corporation (2005), SPSS issued by SPSS, Inc. 
(2006), and many others. They produce very 
similar results. We will use SAS to perform all 
our analyses in this chapter. 


Example 1 

We illustrate the periodogram analysis using 
the classical series of the monthly totals Y, of 
international airline passengers (in thousands) 
from January 1949 to December 1960 quoted by 
Robert G. Brown (1963) and made popular by 
George E. P. Box and Gwilym M. Jenkins (1976). 
The data are plotted in Figure 35.1. The series 
exhibits a clear periodic behavior with higher 
peaks in late summer months and secondary 
peaks in the spring. 

Because of its clear nonconstant variance, we 
calculate the periodogram of the natural loga- 
rithms of the series as plotted in Figure 35.2. 
The components at low frequencies near zero 
are Clearly dominating. This is due to the clear 
upward trend shown in the series. 

The dominating components at low fre- 
quencies due to nonstationarity often make 
other components insignificant. To see the fine 
details of underlying characteristics, it is impor- 
tant to reduce a nonstationary series to a station- 
ary series. There are many methods available 
to remove nonstationarity in a series. One may 
consider differencing or trend removal. In this 
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Figure 35.1 Monthly totals (in thousands) of 
international airline passengers between January 
1949 and December 1960 
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Figure 35.2 Periodogram of the natural logarithms 
of monthly totals of international airline passengers 


example, we will analyze the series of monthly 
growth rates defined by 


Z, = 100[log(Y,) — log(¥,_.)] (4.7) 


Since the data is a monthly series, we are inter- 
ested in detecting a possible seasonal period 
of 12 months. As noted earlier, to avoid possi- 
ble leakage, we will consider observations from 
December 1949 to December 1960 to get the 
monthly growth rates of a full 11 years. The 
periodogram of these growth rates is plotted 
in Figure 35.3. The fine details of the peri- 
odic phenomenon are now evident. For the sig- 
nificance test at a = .05 with the number of 
observations N = 132, we have F,;(2,129) © 3. 
The only significant components occur at the 
fundamental frequency 277/12 and its harmon- 
ics 27k/12,k =1,2,3,4,5, and 6, indicating a 
strong periodic phenomenon with a fundamen- 
tal period of 12 months. Thus, the monthly 
growth rates of international airline passen- 
gers can be very well presented by the cyclical 
model: 


6 
Z,= b+ >> (a, cos(w,t) +b, sin(w,t)+e, (4.8) 
k=1 


where the e, are i.i.d. N(O, 07), and @, = 27k/12. 
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Figure 35.3 Periodogram of the monthly growth 
rates of international airline passengers 


5 The spectrum of time series 
and its estimation 


Consider a stationary time series Z, with the 
autocovariance function y,. The autocovari- 
ance generating function y(B) is defined as 


y(B) = 3 y,.B* (5.1) 


k=—00 


where the variance of the process y, is the coef- 
ficient of B°=1 and the autocovariance of lag 
k,y,, is the coefficient of both Bk and B-<. If 
the autocovariance sequence y, is absolutely 
summable, i.e., e__. |¥,| < 00, then the spec- 
trum or the spectral density exists and equals 


1 , dy ee . 
fluy=s ve")=s- Yo new (5.2) 


where —7 < w < 7. Note that because y, = 
y_,, sin(0) = 0, sin(—wk) = —sin(wk), and 
cos(—wk) = cos(wk), the spectrum (5.2) can be 
written equivalently as 


k=1 


f(w) = =< >. +2 3 VE conto) (5.3) 


Note that the spectrum f(w) is a continuous 
real-valued nonnegative function. Furthermore, 
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because f(w) = f(—w), it is a symmetric even 
function, and its graph is normally presented 
only for 0 < w <7. For a given time series of 
N observations, however, the autocovariance 
of the maximum lag that can be calculated is 
Yw_-1- Thus, we estimate f(@) by 


(N-1) 


2 1 — 
{Ol ~ ye%* 
be (cb) 


k=1 


if. N-1 
= Fe ios 2° % costo) (5.4) 


and call it the sample spectrum. 

To examine the properties of the sample spec- 
trum, let us consider f(@,) at the Fourier fre- 
quency @; = 2aj/N,j =1,...,[N/2]. At these 
Fourier frequencies, the sample spectrum and 
the periodogram are closely related. To see this, 
we note that 


N 
N . . 


= ~ E YZ (costo) - rin, 


9 N 
x FE > Z,(cos(@;t) + sa( 


t=1 


2 N ; N ; 
= Ze i y Ze" 
t=1 t=1 


2|X ae N ar 
4 [Bene] [Ba,-ne0 


t=1 


N N _ a 
==) (Z,-Z)(Z,-Z)e*") ~— (5.5) 


where we use the Euler relation, e*” = 
cos(@,) + isin(w;), and the fact that for j 40, 
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N N 
Ve"! = Di cos(w,t) + isin(;t) 


t=1 t=1 


N 
= )¢cos(wot) cos(a;t) 
t=1 


N 
+ i}) cos(wot) sin(w;t) = 0 
t=1 


Now, n= Fy, > Be ko Z)(Z,—Z) 
Let k= hee (5. 5), we have 


(N-1) - a 
K(a)=2 Do — a3 (Z,_~—Z)(Z,—Z)e* 
pera yeu t=k+1 
(N-1) . 
=2 x Vy e 1ejk 
k=-(N-1) 
(N-1) 
=2 (i 420° % cose) (5.6) 
k=1 


Hence, from (5.4), we have 
A 1 
f(a) = aq hoi) (3,7) 


Since the cosine function is a periodic con- 
tinuous function with a period 27, the peri- 
odogram in (5.6) can easily be extended to all w 
as the following periodic continuous function 
between —7 to 7: 


(N=1) ; 
Iw)=2 DY ye ik 
k=—(N-1) 


(N-1) 
=2 (i 420° }% costo) 


k=1 


Because I(w,) for w, = 27k/N,k =1,...,[N/2] 
is the standard output from a periodogram ana- 
lysis, equation (5.7) becomes a natural candi- 
date for estimating the spectrum and is known 
as the periodogram estimator of the spectrum. 


To see the properties of this estimator, we 
recall from (3.13) that for a Gaussian white 
noise series with mean O and constant vari- 
ance o”, f(@,), for k=1,...,[N/2], are actu- 
ally distributed independently and identically 
as (1/477)07 y?(2) = (07/277) v7(2)/2, which is to 
be denoted as 

A o” y7(2) 

f(x) ia 


(5.8) 


where we note that from (5.3), 07/27 is in fact 
the spectrum for the given Gaussian white noise 
process with mean 0 and variance o”. More gen- 
erally, following the same arguments, for a gen- 
eral Gaussian process with a spectrum f(w), the 
sample spectrum calculated at Fourier frequen- 
cies w, in (5.7) has the following asymptotic 
distribution 


lim flo) = flo) © (5.9) 
Hence, 
lim E(f(@,)) = flax) a 
and 
lim Var(f(@,)) = Var [fe 20) 
=[ flo) eas 


which is independent of the sample size N. 
Thus, although the periodogram estimator of 
the spectrum in (5.7) calculated at a Fourier 
frequency is asymptotically unbiased, it is an 
unsatisfactory estimator because it is not con- 
sistent. The variance of f(w,) does not reduce 
to zero as the sample size N goes to infinity. 
The effort of correcting this deficiency leads to 
the smoothing of the periodogram estimator. 


5.1 The smoothed periodogram estimator 


A natural way to reduce the variance of the peri- 
odogram estimator of the spectrum is to smooth 
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the periodogram locally in the neighborhood of 
the target frequency. In other words, we obtain 
a smoothed periodogram estimator from the fol- 
lowing weighted average of m values to the right 
and left of a target frequency w,, i.e., 


fu(@.) = > Wy(@)f(@,—@,) (5.12) 


j=—m 


where m is a function of N, which is often cho- 
sen such that m-— oo but (m/N) > 0 as N > o; 
and Wy (,;) is the weighting function with the 
following properties 


> Wy(@;) =1 (5.13) 
j=-m 
Wy (@;) = Wy (—@)) (5.14) 
and 
jim 3 Wx (@;) =0 (5.15) 


j=—m 


The weighting function Wy(@,) is called the 
spectral window because only some of the spec- 
tral ordinates are utilized and shown in the 
smoothing. Ifthe f(w) is flat and constant within 
the window, then 


lim El fy(,)] = lim > Wy(o))ELP (x - )) 


j=-m 
= f(ox) 


and 
lim Var [fu(o,) | 


= (fo) lim Yo WE) =0 (5.16) 


j=-m 


where we use the result that the periodogram 
ordinates at different Fourier frequencies are 
independent. 
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The property (5.16) of the spectral window 
implies that the variance of the smoothed peri- 
odogram estimator decreases as N and hence 
m increases. The values of m represent the 
number of frequencies used in the smooth- 
ing. This value is directly related to the width 
of the spectral window, also known as the 
bandwidth of the window. As the bandwidth 
increases, more spectral ordinates are aver- 
aged; hence, the resulting estimator becomes 
smoother, more stable, and has smaller vari- 
ance. Unless f(w) is really flat, however, the 
bias also increases as the bandwidth increases 
because more and more spectral ordinates are 
used in the smoothing. We are thus forced to 
compromise between variance reduction and 
bias, a common dilemma with many statistical 
estimators. 

Because the periodogram is periodic with 
period 27, when the window covers frequen- 
cies that fail to lie entirely in the range between 
—7 to 7, we can extend the periodogram using 
this periodic property. Equivalently, we can 
fold the weights back into the interval —7z to 7. 
As the periodogram is also symmetric about fre- 
quency zero, calculation is only necessary for 
the frequency range between zero and 7. Also, 
as shown earlier, because the periodogram at 
frequency zero reflects the sample mean of the 
series and not the spectrum, it is not included 
in the smoothing, and the value at w, is used in 
its place. 

From (5.4), we see that an alternative 
approach to perform the smoothing is to apply 
a weighting function A,(k) to the sample auto- 
covariances, i.e., 


; 1. “2 a 
fw(@) = oa YY Ay(kh)ye (5.17) 
Ty (N-1) 


Because the sample autocovariance function 7, 
is symmetric and ¥, is less reliable for larger k, 
the weighting function A,(k) should be chosen 
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to be symmetric with its weights inversely pro- 
portional to the magnitude of k. Thus, 


M 


fw(o)=5— YO Aw ne 


T p=-M 


(5.18) 


where the weighting function A,(k) is chosen 
to be an absolutely summable sequence 


swy=a(£) 


which is often derived from a bounded contin- 
uous function A(x) satisfying 


(5.19) 


|AC@)| <1 
A(O) =1 
A(x) = A(—x) 


A(x) = 0,|x|>1 


The value of M is the truncation point that 
depends on the sample size N. This weighting 
function ,,(k) for the autocovariances is called 
the lag window. It can be shown (see William 
W.S. Wei, 2006, p. 305) that the spectral win- 
dow and lag window form a Fourier transform 
pair. The spectral window is the Fourier trans- 
form of the lag window, i-e., 


M 
Wy(o) = 5— Avie @ (5.20) 
k=-—M 


and the lag window is the inverse Fourier trans- 
form of the spectral window, i.e., 


Ay (k) 


=/ Wy(w)e* dw, k=0,+1,...,4M (5.21) 


Both the terms spectral window and lag win- 
dow were introduced by Ralph B. Blackman 
and John W. Tukey (1958). The weighing func- 
tion was the standard term used in the earlier 
literature. 


There are many windows introduced in the 
literature. The following are some commonly 
used windows. 

Rectangular window: 


_ fi,|k)/<M 
Ay(k) = es |k|>M (5.22a) 
where M is the truncation point less than 
(N —1), derived from the continuous rectangu- 
lar function 


— 4ijx[<1 
a) ee >1 


The corresponding spectral window is given 
by 


(5.22b) 


Wy(w) = - sin[w(M +1/2)] 


7  sin(w/2) 22e) 


Bartlett window: Maurice S. Bartlett (1950) pro- 
posed the lag window 


_ | 1—|k|/M,|k| <M 
Ay(k) = i |k| > M (5.23a) 
based on the triangular function 
—_ J1—|x|, |x| <1 
A(x) = iC Ix|>1 (5.23b) 


and hence the window is also known as the 
triangular window. The corresponding spectral 
window is given by 


Wy(@) = 


‘ [since (5.23c) 


27M | sin(w/2) 


Blackman-Tukey window: Blackman and Tukey 
(1958) proposed the lag window 


An (k) 


(5.24a) 


1—2a+ 2acos(7k/M), |k| <M 
0, |k| > M 
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based on the continuous function 


1—2a+2acos(7x), 
noo = {4 (77x) 


|x| <1 


Ix] >1 (5.24b) 


where 0 < a<.25. The corresponding spectral 
window is given by 


a sin[(w—7/M)(M+1/2)] 
27 sin[(@ — 7/M)/2] 
mt (1 — 2a) sin[w(M+1/2)] 
27 sin(w/2] 
a sin{(w+7/M)(M+1/2)] 
277 sin[(w + 7/M)/2] 


Wy (@) = 


(5.24c) 


When a= .23, the window is known as 
Hamming or Tukey-Hamming window, and 
when a =.25, the window is also known as 
Hanning or Tukey-Hanning window or Tukey 
window. 

Parzen window: Emanuel Parzen (1961) sug- 
gested the lag window 


( 1 —6(k/M)? +6(|k| /M)°, |k| < M/2 


Ay(k) = | 2(1—|k| /M)*, M/2 <|k|<M 
0, |k| > M 
(5.25a) 


based on the continuous function 
1—6x? +6 |x|*, |x| < 1/2 

2(1—|x|)?, 1/2 < |x| <1 (5.25b) 
0, |x| >1 


A(x) = 


The corresponding spectral window is given 
by 


Wy (@) = 


3 sin(wM/4) )* 
8a7M3 | a mA 


x {1 —2[sin(w/2)]’/3} (5.25c) 
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In addition, because of its simplicity, espe- 
cially in terms of deriving the sampling prop- 
erties of spectrum estimates, a simple M— term 
moving average of the periodogram surround- 
ing a target Fourier frequency w is also com- 
monly used. Most statistical packages also 
accept a user-provided weighting function. The 
spectral windows given in (5.22c) through 
(5.25c) are obtained through (5.20). For details, 
we refer readers to Wei (2006, Section 13.3.3). 

The quality of a smoothed sample spectrum 
is determined by the shape of the window and 
the bandwidth of the window. The spectrum 
estimates for the same window shape and dif- 
ferent bandwidth are different. In smoothing a 
sample spectrum, we are concerned not only 
about the design of a spectral window with a 
desirable shape known as window carpentry as 
Tukey called it, but also about the bandwidth of 
a window. The latter concern is often more cru- 
cial and difficult in spectral analysis because 
for a given window shape there is no single 
criterion for choosing the optimal bandwidth. 
To ease the difficulty, the following steps are 
often suggested. First, choose a spectral win- 
dow with a desirable shape. Initially calculate 
spectral estimates using a large bandwidth and 
then recalculate the estimates using gradually 
smaller bandwidths until the required stabil- 
ity and resolution are achieved. This procedure 
is often known as “window closing.” Alterna- 
tively, because the bandwidth ofa spectral win- 
dow is inversely related to the truncation point 
M used in the lag window, the bandwidth can 
also be determined by choosing a truncation 
point M such that 9, for k> M are negligible. 


Example 2 

For illustration, we now obtain the spectrum 
estimate by smoothing the periodogram of the 
monthly growth rates of international airline 
passengers using the Parzen window with M = 5. 
The resulting spectrum is plotted in Figure 35.4. 
The smooth curve retains a clear periodic 
phenomenon with a fundamental period 
of 12 shown in a monthly seasonal time series. 
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Figure 35.4 A smoothed sample of the monthly 
growth rates of international airline passengers 


5.2 Approximate confidence interval 
for the spectrum 


Consider a time series from a process with the 
spectrum f(w). Let f(w,) be the unsmoothed 
sample spectral ordinate at Fourier frequencies 
w, = 27k/N, with w, 40 or 7, we have from 
(5.9) that they are independently and identi- 
cally distributed as 


ce) 


f(o,) ~ f(o,) — (5.26) 
Or equivalently, 
2f (w,) ey age 
fap xX (2) (5.27) 


However, this result is no longer true for a gen- 
eral smoothed sample spectrum. Let fy(w) be 
the general smoothed sample spectral ordinate 
at frequency w. John W. Tukey (1949) suggested 
approximating the distribution of fyy(w) by a 
distribution of the form cy?(v), where c and v 
are chosen so that its mean and variance are 
equal to the asymptotic mean and variance of 
fw(w). Thus, it can be shown (see William W.S. 
Wei, 2006, p. 316) that asymptotically we have 


vf (o) is 


Ho) x" (v) (5.28) 


where v is known as equivalent degree of free- 
dom for the smoothed spectrum. The value of v 
depends on the window used in the smoothing 
and is computed as 


2N 


= (5.29) 
M f_, A2(x)dx 


where A(x) is the continuous weighting func- 
tion used in the associated lag window. For 
example, for Bartlett window, we have 
_ 2N 
1 
M f_, (1 —|x|)?dx 
2N _3N 
M[f°, (4+x)?dx+ f) (1—x)2dx]  M 


Thus, from (5.28), we obtain the follow- 
ing (1—a)100% confidence interval for the 
spectrum 


vfw(w) vfy(o) 
Oo ee ae 


where y2(v) is the upper a% point of the chi- 
square distribution with v degrees of freedom. 


Example 3 

For illustration, we calculate the 95% confi- 
dence interval for the spectrum of the underly- 
ing process that generates the monthly growth 
rates of international airline passengers using 
the Parzen window with M = 5. For N = 132 
and M=5, we have, from equations (5.25b) 
and (5.29), v = 3.709N/M = 3.709(132/5) = 
97.9176 ~ 98. Now y2,.(98) = 72.501 and 
Nios (98) = 127.282, the 95% confidence inter- 
val for f(w), from (5.30), becomes 


77 fw(w) < flw) < 1.35fy(@) 


where fy() is the spectrum estimate using the 
Parzen window with M =5 given in Figure 35.4. 
For example, since fy(.5236) = 90.9371, the 
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Spectrum 


Frequency from 0 to 7 


Figure 35.5 The 95% confidence intervals for the 
spectrum of the monthly growth rates of the 
international airline passengers using the Parzen 
window 


95% confidence interval for f(w) at w = .5236 is 
given by 


70.022 < f(w = .5236) < 122.765 


The confidence intervals (dotted lines) for other 
frequencies can be calculated similarly, and 
they are shown in Figure 35.5. 


6 Relationships between two times 
series and cross-spectrum 


6.1 Cross-covariance and cross-spectrum 


Very often time series are observed concur- 
rently on two variables, and we are interested 
in detecting and describing the relationships 
between two series. Given two processes X, and 
Y, for t=0,+1,+2,..., they are said to be 
jointly stationary if X, and Y, are each station- 
ary and the cross-covariance between X, and Y, 
is a function of time difference only. In such a 
case, the cross-covariance function between X, 
and Y, is given by 


Yxy(k) = E[(X,—- wx) (Yne—by)] (6.1) 


for k= 0,+1,+2,..., where py = E(X,) and 
by = E(Y,). Upon standardization, we obtain 
the cross-correlation function: 


Yxy (k) 


xTy 


Pxy(k) = (6.2) 
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where oy and oy are the standard deviation 
of X, and Y,, respectively. While the cross- 
covariance and cross-correlation functions are 
useful measures of the relations between two 
processes, they are only in terms of integer lags. 
If we want to study the relationship between the 
two series at any frequency and hence at any 
time lag, we will extend the univariate spectral 
analysis to cross-spectral analysis. For a joint 
process X, and Y, with an absolutely summable 
cross-covariance function, its cross-spectrum or 
cross-spectral density is given by 


—iok 


fxy(@ i= —vev(e™) = 5 s Yxye 


ie ae 


= lex (o) ~iguy(0)) (6.3) 


where the real portion 


cxr(0) = 5S Yuv(o) cos(ok) 


i ae 


is known as the cospectrum, and the imaginary 
portion 


guv(0) = 5— YD Ixv(o) sin(oh) 
k=—00 


is known as the quadrature spectrum. 
We can also write the cross-spectrum in the 
following polar form 


fay(@) = Ayy(w) ec’? (6.4) 
where 


Ayy() = lfxy(@)| 


= [c2,(w) + @y(o)]” (6.5) 
and 
dxy(@) = tan | (6.6) 
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The functions Ayy(w) and dyy(w) are called 
the cross-amplitude spectrum and the phase 
spectrum, respectively. In addition, for help- 
ing interpretation, we often also consider two 
other useful functions, the coherence and the 
gain function. The coherence (or the squared 
coherency), K%,(w), is defined by 


fv (@)|" 
fx(@)fy() 


where fy(@) and f,(w) are spectrums for X, and 
Y,, respectively. The gain function is defined as 


Ifxy (@)| = Ayy(@) 
fx(@) fx(@) 


which is the ratio of the cross-amplitude spec- 
trum to the input spectrum. 

The cross-amplitude spectrum measures 
covariance between X, and Y, processes at w — 
frequency. The coherence, like R? in regression, 
indicates the correlation between the w — fre- 
quency component of X, and the w — frequency 
component of Y,. Clearly, 0 < KZy(w) <1. A 
value of Kf,(w) close to 1 implies that w — fre- 
quency components of the two series are highly 
linearly related, and a value of Kfy(w) close to 
O implies that they are only slightly linearly 
related or not linearly related. The gain func- 
tion is simply the absolute value of the standard 
least squares regression coefficient of the w — 
frequency component of X,. The phase spec- 
trum, dyy(@), is a measure of the extent to 
which each frequency component of one series 
leads or lags the other. For example, in a sim- 
ple causal model where there is no feedback 
relationship between X, and Y,, series X, leads 
series Y, at frequency w if the phase ¢y,(w) 
is negative, ie., Y, = aX,,+e,; and series X, 
lags series Y, at frequency w if the phase dxy() 
is positive, ie, X,= BY,,+e,. For a given 
dyy(w) at frequency o, the actual time unit is 
given by dyy(w)/w. Hence, the actual Jead time 
from series X, to series Y, at frequency w is 
equal to 


Kee) = (6.7) 


Gyy(w) = (6.8) 


@ 
aa _ bxy( ) (6.9) 
w 
which is not necessarily an integer. A negative 
lead time in (6.9) indicates that series X, lags 


series Y, at frequency w. 


6.2 Estimation of the cross-spectrum 

Given a bivariate series X, and Y, for t = 
1,2,..., and N, let fy(w) and fy(w) be the 
smoothed spectrum estimates of f,(w) and 
fy(@), respectively, and 


(1N - _ 
(4= eg = Ft) kS0 


os 


—k 

Dy 
Vy (k) = - 
= 


a 


e-ink ¥),.k2 0 


(6.10) 


be the sample cross-covariances, we extend the 
smoothing method discussed in Section 5, and 
estimate the cross-spectrum by 


o 1 = A —Iw 
fxy(@) = ie ‘e Axy (kK) ¥xy (Ke K (6.11) 
7 k=—Myy 


where Myy and Ax;(k) are the correspond- 
ing truncation point and lag window for the 
smoothing. The same lag windows introduced 
in Section 5.1 can be used. Clearly, 


Mxy 


R 1 : 
Cxy(@) = = a Axy (kK) Yxy(@) cos(wk) 
T ke Myy 
and 
1 Mxy 
qxy(w) = = » Axy (k) ¥xy(@) sin(wk) 
sd k=—Mxy 


Hence, we may estimate the cross-amplitude 
spectrum, the coherence, the gain function, and 
the phase spectrum as follows: 


Axy(@) = Fev(o)| 


= [Gy(@) + Gy (w)]” 


(6.12) 
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Ixy (@) + Fy ()| 


Ky (@) = (6.13) 
- fe(w)fy(o) 
Gyy(@) = . (6.14) 
xX @ 
and 
dxy(w) = tan || (6.15) 
xy \@ 


The time lag can then be estimated by equa- 
tion (6.9) with the estimate of the phase spec- 
trum from (6.15). 

The truncation point My, and the lag win- 
dow Ayy(k) are chosen in a similar way to those 
in the univariate spectral analysis. In fact, we 
always begin the cross-spectral analysis with a 
careful univariate spectral analysis of each indi- 
vidual time series. The cross-spectral analysis 
is interesting and meaningful only when the 
univariate spectra are significant and contain 
enough power in one or both series. It should 
also be noted that the estimate of the phase 
spectrum is not reliable and meaningful when 
the coherence is small. 

The sampling properties of the estimates 
of various cross-spectral functions are clearly 
related to the weighting function used in the 
smoothing. Because of limitations of space, we 
will not discuss them, and we instead refer 
interested readers to Peter Bloomfield (2000), 
David R. Brillinger (1975), and Maurice B. 
Priestley (1981). 


Example 4 

For illustration, we consider two monthly time 
series of spot prices of natural gas in Louisiana 
(X,) and Oklahoma (Y,) between January 1988 
and October 1991 shown in Figure 35.6. It was 
known that the spot price for Louisiana natural 
gas was known on or before the first day of 
trading on the Oklahoma market. The datasets 
can be found in Wei (2006, p. 579). 
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Spot price 


C4NHWRUMDNDOO=NWAH 


Jan1988 Jan1989 Jan1990 Jan1991 Jan1992 
Date 


Figure 35.6 Spot prices of natural gas in Louisiana 
(solid line) and Oklahoma (dotted line) between 
January 1988 and October 1991 


As shown in Figure 35.6, both series X, and 
Y, are stationary. We begin with a periodogram 
analysis for each individual series. The results 
indicate that each series contains a clear peri- 
odic component with a fundamental period 
of 12 months. We then use SAS to calculate 
the estimates of various cross-spectral functions 
with the Tukey window. The coherence and 
the phase spectrum are shown in Figures 35.7 
and 35.8, respectively. The coherence remains 
substantially strong at all frequencies, implying 
that during the study period between January 
1988 and October 1991 the correlation between 
the w — frequency component of X, and the 
w — frequency component of Y, are strong. 
The phase spectrum is nearly 0 at frequencies 


K2 
fo) 
Q 


0 1 2 3 4 
Frequency from 0 to 7 


Figure 35.7. The coherence of Louisiana and 
Oklahoma spot prices of natural gas 
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Figure 35.8 The phase spectrum of Louisiana and 
Oklahoma spot prices of natural gas 


between 0 and 27/3 and negative at higher 
frequencies. It implies that during the same 
study period the two series are very much in 
alignment at low frequencies between 0 and 
27/3, but series X, leads series Y, at high 
frequencies. In other words, during the study 
period the spot price of Louisiana natural gas 
leads the spot price of Okalahoma natural gas 
for a very short period of time. 


7 Some mathematical detail 


Let w, = 27k/N,k = 0,1,...,[N/2] be the 
Fourier frequencies. We have used the follow- 
ing trigonometric identities in the derivation 
of Fourier series representation and the peri- 
odogram analysis of a series: 


N — 
> cos(w,t) = tae (7,1) 
N 
> sin(w,t) = 0, all k (7.2) 
N 
> cos(w,t) cos(w;t) 


t=1 
N, k=j=0 or N/2(N even) 
= { N/2,k=j 40 or N/2(N even) (7.3) 
Ok #j 


> sin(@,f) sin(@;f) 


t=1 
(0, k=j=0 or N/2(N even) 
= 1 N/2,k=j #40 or N/2(N even) (7.4) 
OKA; 
and 


N 
>> cos(w,t) sin(w;t) =0, for all k andj (7.5) 


t=1 


To see these, we use the Euler relation 
ek = cos(w,)+isin(@,) (7.6) 


It follows that 


elk _ elk 


21 
and 
elk 4 e7lex 
cos(w,) = — (7.8) 
Now, 

N iw,N iw,N 
S- elit = el ies — elk ca =] 
=4 1—e™k eek —1 

_ me ElOKN/2 (QloxN/2 _ @ FKN/2) 127 

@iog/2 (el@x/2 = eteK/2) 127 
gion neen2 Sin (ON/2) 
sin(@,) 
w,(N+1))\ sin(w,N/2 
~ cos (2kN+1)) sin(o,N/2) 
2 sin(@,) 
o,(N+1 
+isin( SOOO Bi ) 
2 
sin(w,N/2 
x sito N/2) (7.9) 
sin(w,) 


However, from (7.6), we have 


N N N 
yi e = ¥ cos(w,t) +i >- sin(@,t) 
t=1 t=1 t=1 
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Equations (7.1) and (7.2) follow because 


x _ o,(N +1))\ sin(w,N/2) 
¥¢08(1,1) = cos ( ; ) sto) 
_ | hk=0 
~10,k40 
and 


N 
> sin(@,t) = sin 
t=1 


(as >) sin(w,N/2) 
2 sin(w;/2) 


=0,k=0,1,...,[N/2] 


where we note that 


sin(w,N/2) _ sin(wk) _ [N,k=0 
sin(w,/2)  sin(ak/N) — ape 0 


os (as**) = 1, and sin (a=) =0 


Equations (7.3), (7.4), and (7.5) follow immedi- 
ately from (7.1), (7.2), and the following trigono- 
metric identities 


cos(@,) Cos(w;) = ; {cos(w, + @;) 
+cos(@,—«;)} (7.10) 
sin(w,) sin(w,) = ; {cos(w, — @,) 
—cos(w,+o,)} (7.11) 
and 
cos(w,) sin(w;) = ; {sin(@, + @,) 
—sin(w,—o,)} (7.12) 
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| Chapter 36 i 


Time-series techniques for repeated 
cross-section data 
David Sanders and Hugh Ward 


This chapter is concerned primarily with tech- 
niques that can be used to analyze aggregate 
data which describe different individuals over 
time. Such “repeated cross-section” data may 
be drawn from opinion surveys conducted by 
commercial polling agencies which interview 
a “new” random sample of respondents each 
month in order to ask them the same set of 
survey questions. Typically, the percentage of 
respondents who answer a given question in 
a particular way (e.g., the percentage answer- 
ing “Conservative” in response to the ques- 
tion, “Which political party would you vote 
for if there were a general election tomor- 
row?”) varies over time. This provides the 
researcher with an aggregate time series which, 
in principle, can be related empirically either 
to other attitudinal time series (e.g., responses 
to questions about consumer confidence) or to 
“objective” features of the economic and politi- 
cal environment, such as employment, interest 
rates and inflation. 

However, a number of different techniques 
may be used for analyzing aggregate time-series 
data, and the choice between the techniques 


Reprinted from Analyzing Social and Political 
Change, Angela Dale and Richard Davies, eds, SAGE 
Publications, 1994. 


must depend ultimately on the kind of epis- 
temological assumptions about the nature of 
“explanation” that the researcher is prepared 
to make in formulating a statistical model. 
The first four sections of this chapter review 
the main approaches to time-series model- 
ing. The subsequent section applies the four 
techniques to the same data set and shows how 
the choice of technique can affect the statisti- 
cal results obtained. The penultimate section 
briefly reviews the main strengths and weak- 
nesses of the different techniques, concentrat- 
ing particularly on the different epistemological 
assumptions that they make. 

The four techniques for time-series analysis 
described below are all based upon the linear 
model. All allow for the complex multivariate 
analysis of interval-level data, making provi- 
sion for the estimation of the effects exerted by 
a range of continuous and categorical explana- 
tory variables. All can be used for analyzing 
historical data and for forecasting purposes. 


1 The simple ordinary least squares 
(OLS) method 


The form of the conventional regression (or OLS) 
model for cross-sectional data is well known: 


V =Bo +B, X, +B, xX +++ +B, x, +E (1) 
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where y is the response variable, x,, X,,..., Xx 
are explanatory variables, e is a normally dis- 
tributed random error term, and the cases may 
be individuals or social aggregates. 

With aggregate time-series data, the OLS 
model becomes 


Vi = Bo + Bi X14 + BoXo¢ +++ + BEX + UY (2) 


where the ¢ subscripts indicate that the y and x 
variables are measured over time; u, defines the 
error term; and the cases are time points defined 
by the period for which data are available. The 
fundamental problem with OLS in this situa- 
tion is that the u, tend not to be independent’, 
violating an important assumption upon which 
conventional methods of analysis depend. 

The “serial correlation” in the error may often 
(but not always) be approximated by a “first- 
order autoregressive process” or AR(1) in which 


Uy = Puy_1 + &% (3) 


where u, and u,_, are the (systematic) errors 
from an OLS time-series regression and e, is 
an independently distributed error term. Such 
serially correlated error does not prejudice 
parameter estimation by OLS regression mod- 
els but the standard errors of the coefficients 
will, in general, be underestimated and the R? 
overestimated. As a consequence, the risk of 
accepting a false hypothesis is increased. 

On the other hand, serially correlated error 
may be a symptom of other misspecifications. 
These could include the omission of important 
explanatory variables which are correlated with 
variables in the model. They could also include 
failure to represent “feedback” dependence in 
which the level of response is dependent upon 
previous values of the response variable. If OLS 


‘Given that with most time-series data, y,_, is gen- 
erally a good predictor of y,, it follows that u,_; is 
likely to be a good predictor of u;. In other words, u, 
and u,_, are likely to highly correlate. 


methods are used in these circumstances, mis- 
leading results may be expected not only for 
standard errors but also for the parameter 
estimates. 


2 Autoregressive (maximum 
likelihood) models 


The autoregressive model offers the most imme- 
diate and obvious solution to the problem of 
serially correlated error: if serial correlation is 
distorting the standard errors of a given OLS 
model, why not attempt to specify the nature 
of the distorting “autoregressive” process and 
re-estimate the model taking account of that 
process? This should also result in more effi- 
cient estimation of the B parameters. A prag- 
matic approach is to fit an OLS model and 
then examine the pattern of intercorrelation 
among the estimated u,,u,_,,U,_»,.... If u, cor- 
relates strongly only with u,_,, then the relevant 
“error process” can be described as an AR(1); if 
u, correlates strongly only with u,_, and u,_,, 
then the process is an AR(2); and so on.”? The 
chosen error process is then incorporated into 
the estimation procedure; maximum likelihood 
(or asymptotically equivalent) methods for AR 
models are available in many standard software 
packages. If the specified autoregressive struc- 
ture is correct, the estimation problems asso- 
ciated with OLS largely disappear: estimated 
standard errors are not deflated and, as a result, 
significance testing becomes a reliable exercise. 
This in turn means that the risks of wrongly 
accepting a false hypothesis (which spuriously 
links y, to some x,) are kept to a minimum. 
This said, autoregressive models have been crit- 
icized on two main grounds. 

First, the p; coefficients on the u,,..., Uj_, 
terms are often difficult to interpret in 


"If u, correlates only with, say, u,, and not 
with u,_; or u;_2, then this can be regarded as a 
“restricted AR(3)” model and estimated accordingly. 
See Pesaran and Pesaran (1987, pp. 69-70, 150-51). 
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substantive terms.? Second, models based 
purely on autoregressive techniques may sim- 
ply be misspecified; in particular, they may 
have omitted important exogenous or endoge- 
nous variables which need to be included 
explicitly in the model rather than incorpo- 
rated implicitly via the “catch-all” autoregres- 
sive structure. These criticisms are all the more 
potent in the case of “restricted” autoregressive 
models where u, appears to be a function of, 
say, u,_, but not of u,_,,..., U;_4. Not only is 
such a result difficult to interpret substantively, 
but it also suggests that an additional exogenous 
variable (operating with a lag of around five 
time points) should be included explicitly in 
the model. In response, of course, the advocate 
of autoregressive techniques can argue that any 
autoregressive term is simply being employed 
instrumentally in order to obtain an accurate 
assessment of the magnitude of the effect of 
Xy,,--+,Xy On y;, and that the question of the 
substantive meaning either of p, or of u,_, is 
therefore irrelevant. 


3 The lagged endogenous variable 
OLS method 


The defining feature of this technique is the 
inclusion of a term for y,_, on the right-hand 
side of any equation which tries to predict y;. 
This is in addition to the hypothesized effects 
of any exogenous variables which also need to 
be included. The basic form of the model is 


Vi = Bo + OVi-1 + Bi Xs + BoXo 
+++ Bp xy + U; (4) 


where y, is the response (or endogenous) vari- 
able; y,_, is the endogenous variable lagged 
by one time point; x,,..., x, are explanatory 


3 Although p, can be taken to suggest the rate at which 
the past is discounted, it is still difficult to give sub- 
stantive meaning to the expression (u, — pu,_,). 


(exogenous) variables (which may exert lagged 
rather than simultaneous effects as shown in 
this example); and u, is random error. Spec- 
ifying the model in this way has two signifi- 
cant advantages. The first is that it frequently 
circumvents the problem of serially correlated 
error associated with simple OLS. The sec- 
ond advantage derives from the fact that y,_, 
summarizes all the past effects of unmeasured 
variables (i.e., variables external to the model) 
on y, (see Johnston, 1972, pp. 292-320). This 
means not only that the effects of measured 
variables (x,,,..., X,) on y, can be estimated 
more accurately than would otherwise be the 
case, but also that the coefficient on y,_,(a) rep- 
resents the “discount rate’—the rate at which 
past influences on y, decay. This latter feature of 
the model—that a is the discount rate—is par- 
ticularly useful for specifying the rate of decay 
of “intervention effects”, such as the occurrence 
of particular political events. As shown below, 
for example, the Falklands War boosted govern- 
ment popularity in May 1983 by some 8.6%. 
The coefficient on the lagged dependent vari- 
able (a = 0.83) enables us to infer that this effect 
decayed at a rate of about 0.83 per month there- 
after. This implies that the “May boost” was 
worth 8.6 x 0.83 = 7.2% in June; 7.2 x 0.86 = 
5.9% in July; and so on. It should be noted, 
however, that with this sort of model specifica- 
tion the effects of all measured exogenous vari- 
ables are constrained to decay at the same rate. 
As discussed below, one significant advantage 
of Box-Jenkins models is that they permit the 
specification of a different decay rate for each 
exogenous variable. 

Three other points need to be made about 
the lagged endogenous variable method. First, 
as with the simple OLS specification outlined 
earlier, the estimated coefficients may not be 
stable over time; i.e., they may take on radically 
different values if they are estimated over dif- 
ferent subsets of the entire time series. A series 
of diagnostic tests for parameter stability are 
available (CUSUM, CUSUMSQ and recursive 
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coefficient tests) and in general these should be 
applied systematically if either simple OLS or 
lagged endogenous variable methods are being 
used (see Brown et al., 1975). If a particular 
model fails these tests, then it is probably mis- 
specified and requires either the inclusion of 
further exogenous variables or a respecification 
of the ways in which the existing x,,,..., Xy 
are hypothesized to affect y,. Second, even with 
the inclusion of y,_, in the equation, it is still 
possible that the error term from (4) will be sub- 
ject to serial correlation. For example, one of 
the following could be the case: 


Uy = PU;_1 + & (5) 
Up = Py Uy_y + Pg Uy_g °° PeUp_p + & (6) 
U; = PU;_2 + & (7) 


where (5) denotes a first-order autoregressive 
error process, (6) denotes a kth-order process, 
and (7) denotes a “restricted” second-order pro- 
cess. In any of these cases, as with simple OLS, 
some sort of correction for the error process 
is required. This can be resolved, of course, 
by incorporating an appropriate autoregressive 
error function in the model, although such a 
strategy carries with it all the limitations of the 
autoregressive model which were noted earlier. 

Finally, it is worth observing that the lagged 
endogenous variable method described here 
represents a subspecies of the general to specific 
“Hendry” methodology followed by many UK 
econometricians (see, e.g., Hendry, 1983). This 
approach seeks to specify the short-run dynam- 
ics of time-series relationships by moving from 
a general model specification (which, in addi- 
tion to y,_,, includes all potential exogenous 
influences on y, at all theoretically plausible 
lags) to a more limited, empirically deter- 
mined specification which eliminates all non- 
significant exogenous terms. Thus, for example, 
a theoretical model which hypothesized that y, 
was influenced by x,, or x,,, and in which it was 
assumed that any changes in x,, or x,, would 


take no longer than three periods to affect y,, 
would initially be specified as 


Vi = Boo + OVi-1 + Bio Xe + Ba X11 
+B 2Xq1-2 +BigX11-3 + Bro Xne 
+ Boi X11 + Boo Xor-2 + Bog Xar_3 + Uy (8) 


If it turned out empirically that x,, influenced y, 
with a lag of one time point while x,, influenced 
y, with a lag of two time points, then the final 
specification would be 


Vt = Boo + 1 Yi-1 + Bir X11-1 +BooXoe-2 +, (9) 


though, as noted before, this specification 
would have to be checked for both parameter 
stability and serially correlated error. 


4 Box-Jenkins (ARIMA) methods 


Box-Jenkins techniques differ from the 
regression-based techniques outlined above in 
two significant respects: (1) in their emphasis 
upon the need for time-series data to be system- 
atically “pre-whitened” and the consequences 
this has for the way in which models are 
specified; and (2) in their facility for handling 
complex “intervention” specifications. We 
discuss each of these features in turn. 


4.1 Pre-whitening and model specification 


The basic data analytic principle underlying 
the Box-Jenkins methodology is that x,_, helps 
to explain y, in theoretical terms only if it 
explains variance in y, over and above the 
extent to which y, is explained by its own 
past values. The application of this principle 
in turn means that Box-Jenkins methods nec- 
essarily place great emphasis on the need to 
establish the precise nature of the “process” 
that is “self-generating” y,. This is effected by 
the use of autocorrelation and partial auto- 
correlation functions which enable the ana- 
lyst to determine what sort of “autoregressive” 
or “moving average” process (respectively, AR 
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and MA processes, contributing to the ARIMA 
mnemonic) is generating y,.* Once the self- 
generated sources of y, have been specified, 
the analyst can then introduce an exogenous 
variable x, into the model in the form of a 
“transfer function”. The precise lag structure 
for the effects of x, is determined by the use of 
a “cross-correlation function” which correlates 
the non-self-generated variation in y, with the 
non-self-generated variation in x, over a range 
of different lags and leads. If x,_, (i.e., x, at the 
specified lag or lags) yields a significant coeffi- 
cient and produces a nontrivial reduction in the 
“residual mean square” of the transfer function 
(in other words, if x,_, adds to the variation in 
y, that is explained by the model as a whole), 
then it can be concluded that x,_, does indeed 
exert an exogenous influence on y;. 

An important prerequisite of this modelling 
strategy is that all data which are to be used 
need to be “pre-whitened”. This means that, 
before they are included in any transfer func- 
tion analysis, all variables must be rendered 
mean and variance stationary: any trends in 
the component variables must be removed 
prior to analysis. This is normally effected by 
“differencing”, where a first difference of y, is 
defined as 


Wi =Vi-Vi-r (10) 


and where a second difference of y, is 
defined as 


Vy; = Vy —-Vyi-1 (11) 


Simple linear trends can usually be removed 
by first differencing; a decline-recovery (or 


‘For an accessible introduction to autocorrelation 
and partial autocorrelation functions, and indeed 
to Box-Jenkins techniques generally, see Liu (1990). 
The AR(1) model can be expressed as y, = ay;_, + U;, 
where u, is a disturbance term and a is the parameter 
of the model. For an MA(1), the model can be written 
as y; = U;—pu;,_;, where u, and u,_, are disturbance 
terms and p is the parameter of the model. 


rise-decline) trend by second differencing; and 
so on (see Appendix). 

Given the assumption of pre-whitened data, 
the form of the Box-Jenkins model is relatively 
straightforward. Two matters complicate any 
presentation of it, however. First, Box-Jenkins 
techniques not only allow for the estimation of 
the direct effect of a change in x, on y, (analo- 
gous to the B coefficients in (4)), but also allow 
for the estimation of adjustment to steady state 
or “discount” parameters (analogous to the a 
coefficient in (4)) associated with those direct 
effects. One particularly attractive feature of 
the Box-Jenkins specification is that it permits, 
in effect, a different discount rate to be esti- 
mated for each exogenous variable. This can be 
a significant advantage over the lagged exoge- 
nous variable specification, which, as noted 
earlier, constrains the discount rate to be identi- 
cal for all exogenous variables. What all of this 
means is that in the Box-Jenkins model there 
are potentially two parameters associated with 
each exogenous variable: a w; parameter, which 
measures the direct effect of x,, on y,; and a 
6, parameter, which (in general) measures the 
rate at which the direct effect decays over time. 

The second complicating aspect of the Box- 
Jenkins approach is the highly compressed 
nature of the notation, which often makes inter- 
pretation difficult for the nontechnical reader. 
For one thing, expositions of the method almost 
invariably employ the “backshift operator” B, 
where By, means y,_,; By, means y,_,; and 
so on. For another, a general statement of the 
model would require rather more elaboration 
about the nature of AR and MA processes than 
can be developed here. We will therefore seek to 
illustrate the character of the model by the use 
of a hypothetical example. Consider a situation 
in which (a) a stationary endogenous variable y, 
is influenced by two stationary exogenous vari- 
ables x,, and x,,; (b) x, affects y, with a lag of 
one time point; and (c) x,, affects y, with a lag of 
three time points. The “compressed” statement 
of this model would be: 
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3 


@1,B M93 
(1—8,B) 


=B 
Mt of Gasper 


Xy+p, (12) 


Without the use of the backshift operator, (12) 
becomes 


Ve= Bo + (8, +82) V1 — 818,92 
+ O44 (X11 = 52X14-2) 


+ @03(X 4-3 — 8X) 4-4) + u, (13) 
where 


By =, Bo(1 — 8, —6, +6,5,) 


/ 
U, = U; — 8, U;_, — 82 Uy_1 + 8,82U;_» 


and where y,, x,, and x,, are measured variables; 
8, is a constant; w,, is the direct effect parame- 
ter for x, ,_1; 3 is the direct effect parameter for 
X> 33 6, and 6, are the decay parameters for w,, 
and w,, respectively; and u; is a “white noise” 
(random) error component. With appropriate 
software (for example the BMDP-2T program), 
the estimation of the model described in (12), as 
well as rather more complex models, becomes 
arelatively straightforward exercise. Evaluating 
such models is largely a matter of examining 
the significance levels of the estimated param- 
eters, checking that the transfer function model 
has a lower residual mean square value than the 
simple self-generating ARIMA model for y,, and 
ensuring that u, is indeed a white noise term in 
which no ARIMA process is evident. Provided 
that the model under analysis satisfies each of 
these conditions, it may be concluded that the 
specified exogenous variables do indeed affect 
y,, and that the nature of their impact is defined 
by the appropriate , and 6; parameters. 


4.2 Complex intervention specifications 
in Box-Jenkins analysis 


It was noted earlier that “intervention” effects 
can be estimated using the lagged endoge- 
nous variable method. Using this model, an 
intervention effect (such as the beginning of a 


war or the introduction of a new piece of legis- 
lation) is operationalized as a dummy variable 
I,, which takes the value unity for the single 
time point when the intervention begins and 
zero otherwise. For any given y,, this yields 


Vi, =Bo toys +Bil, + u; (14) 


The estimated coefficient on JI, represents the 
increase or decrease in y, associated with 
the intervention; the coefficient on y,_, denotes 
the rate at which the intervention effect decays. 

Box-Jenkins models allow for several addi- 
tional ways of specifying such intervention 
effects. In the ensuing review we translate the 
models into a notation familiar to users of 
regression techniques. 


The gradual-permanent model 

In this model an exogenous variable intervenes 
at some time ¢*, firstly, to change y, immedi- 
ately by some given amount, and secondly, 
to further change y, through time in a way 
that approaches some upper limit. Compare 
the two models shown in Figures 36.1 and 
36.2. The step function in Figure 36.1 is the 
standard dummy variable model of classical 
regression analysis, the dummy variable hav- 
ing the same (positive) impact on E(y,) (the 
expected value of y,) for all values of t greater 


A 
E(y;) 


> 
: Time 
Figure 36.1 The dummy variable step-function 
model 
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than or equal to ¢*. In contrast, the gradual- 
permanent model postulates a build-up of the 
intervention effect over time. Such a gradual 
build-up frequently seems more theoretically 
plausible then the “abrupt” effect modeled by 
a straightforward dummy: a new piece of race 
relations legislation, for example, may inhibit 
discrimination against ethnic minorities to 
only a limited degree initially, but the effects of 
the legislation may build gradually over time. 

The simplest way of writing the gradual- 
permanent model is® 


y,=N, for t < t* (15) 
t-t* 
¥:=O) H4+N, fort>t*,0<8<1 (16) 
r=0 
where y, is a measured variable; N, is the 
ARIMA process self-generating y,; w is the ini- 
tial intervention parameter; and 6 is the “adjust- 
ment” parameter. As shown in Figure 36.2: 


E(y;) 


Sa 


a 


5 


t A+1 +2 +3 +4 Time 


Figure 36.2 The gradual-permanent model 


E(y,) = 08° = 0 
E(y,) = 0(8°-+8') =a +08 
E(y,) = 0(8°-+8!+8?) 

= 0+ 05+ 08? 


at time f*, 
at time ¢*+1, 
at time t*+2, 


5In the Box-Jenkins notation, y, = [w/(1—6B)|I, + N;. 


and so on. Given that 6 must be less than unity, 
successive increments of 6" become smaller and 
smaller. As 6 approaches unity, the effect grows 
in an almost linear manner; as 8 approaches 
zero, the growth in y, tails off more and more 
rapidly. The value of 6, in short, is a parameter 
for the rate at which increments to y, decay. 
Clearly, since the increments form a geometric 
progression, if 6 < 1 the series will converge on 
an upper limit of w/(1—53). If w is negative, the 
intervention has a reductive effect on y, which 
gradually increases in magnitude. 


The abrupt-effect/gradual-decline model 
This specification is directly equivalent to the 
intervention effect associated with the lagged 
endogenous variable method summarized in 
(14). The model is most easily written as® 


for t < t* 
fort>?t*,0<8<1 


y=N, 
a; = wds-©) + N, 


A specification of this sort, for a positive value 
of w, is illustrated in Figure 36.3. The parameter 


E(y;) 


bo 
aa) 


So 


t h+1 h+2 v+3 Time 


Figure 36.3 The abrupt-effect/gradual-decline 
model 


®In the Box-Jenkins notation, y, = [w/(1—6B)]VI,+ N,. 
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6 measures the rate of decay in the initial 
effect; the smaller the value of 6, the faster the 
effect decays. As t increases, the effect of the 
intervention approaches zero. 


The gradual-temporary model 

In this specification the intervention has an 
immediate effect which builds up thereafter 
to some maximum value, and then decays 
gradually to zero. The simplest way of writ- 
ing the model for the case in which the 
effect builds up for one period and then 
decays is’ 


y,=N, for t < t* 
Vi, = %+N, for t= t* (17) 
JY; = 0,8) 4.0,8°-) for t> t* 


where 0 <6 <1 and where w, and w, both 
have the same sign. As shown in Figure 36.4, 
the intervention model here is merely the sum 
of two abrupt-temporary models, with the first 
intervention commencing at f* and the second 
at t*+1. As long as , is greater than w,. — 80, 
the effect of the overall “summed” intervention 
increases in the first post-intervention period 
and then gradually declines.® 

As these examples show, Box-Jenkins tech- 
niques provide a variety of powerful and 
plausible ways of modeling intervention effects. 
However, it should be noted that, with the 


7In the Box-Jenkins notation, y, = [(@) + ®,)/ 
(1—8B)|VI, + N;. 

®This model, like the abrupt-effect/gradual-decline 
model, can be estimated using the lagged endogenous 
variable method. The model described in Figure 36.4, 
for example, could easily be estimated with the spec- 
ification y, = By + o¥;1 +B, +Boliy1 + u;, where B, 
and B, are directly equivalent to wy) and o, in (17); 
where a is analogous to 6 in the same equation; where 
I, is a dummy variable which takes on the value one 
for the period of the intervention and zero otherwise; 
and where u, is a random error term. 


(a) 
E(y;) 


50 
+ 50, 
(ny 0 

? co + 3a 

504 + 
50, 

ia h+1 vh+2 +3 

(b) 

Op 5a 5° Sa 


tf h+1 vh+2 +3 
(c) 
Oy 80, Fa, 
tf h+1 +2 +3 Time 


Figure 36.4 The gradual-temporary model: (a) total 
effect of w, and , interventions (b) effect of w, 
intervention (c) effect of w, intervention 


exception of the gradual-permanent model, 
similar intervention models can be specified 
using the somewhat simpler lagged endogenous 
variable technique. 
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5 An application of different time 
series to the same set of data: or how 
different assumptions can produce 
different conclusions 


The particular example used here involves a 
problem that has interested political scientists 
throughout the postwar period: the question 
of the connections between the popularity of 
the incumbent government and the state of 
the domestic economy. The simple theoret- 
ical model that is investigated is shown in 
Figure 36.5. It hypothesizes that economic fac- 
tors affect government support in two different, 
if complementary, ways. One set of (“evalua- 
tive”) effects derives from the objective state of 
the economy as a whole and it is assumed that 
the better the overall performance of the econ- 
omy, the more likely it is that the incumbent 
government will be rewarded with a high level 
of popular support. In the model presented here 
we measure the overall strength/weakness of 
the economy as an aggregate of three variables: 
inflation, unemployment and import prices.° 
The resultant “misery index”, which in effect 
measures the weakness of the overall economy, 
is predicted to exert a negative effect on govern- 
ment popularity. The second set of economic 
effects is rather more “subjective” and “instru- 
mental”. They are concerned with the extent to 
which macroeconomic changes are perceived 
by voters as affecting their own narrow self- 
interests. If voters perceive that they are “doing 


®These variables were selected partly because of their 
close connections with the overall level of economic 
activity and partly because the first two at least have 
consistently received a great deal of media attention in 
the UK. All three were highly collinear (all bivariate 
correlations above r = 0.9 during the period analyzed 
here), which was why aggregation was considered nec- 
essary. The variables were aggregated by standardiz- 
ing (to give each variable mean zero and unit standard 
deviation) and summing. The index accordingly gives 
equal weight to the three component variables. 


Evaluative Instrumental 
objective effects effects 


Consumer confidence 


Inflation 
+ Falklands 
Unemployment = + effect 
Import prices SS: “ 
= Government 


popularity 


Figure 36.5 Hypothesized model of the main 
economic and political effects on UK government 
popularity, 1979-87. A plus sign denotes a 
predicted positive relationship, a minus sign a 
predicted negative relationship; the objective effects 
are combined to form a single misery index 


very well” as a result of current policies, then 
they are more likely to lend their instrumental 
support to the party in power. The model 
assumes that “instrumental” judgements of this 
sort can best be measured at the aggregate level 
by the state of consumer confidence. If electors 
are optimistic about their own family’s financial 
prospects, then they will be more likely to sup- 
port the government whose policies produced 
that optimism in the first place; economic pes- 
simism, in contrast, is likely to be associated 
with reduced governmental support.’® Finally, 
it should be noted that the model also antici- 
pates a political “intervention” effect: the Falk- 
lands War. Given that government popularity 


10The consumer confidence measure employed here 
is based on the following monthly Gallup question 
which has been asked regularly since 1975: “Think- 
ing about the financial position of your household 
over the next 12 months, do you expect to be: a lot 
better off; a little better off; about the same; a little 
worse off; a lot worse off?” The consumer confidence 
index is obtained by subtracting the percentage of 
respondents who think they will be worse off from 
the percentage who think they will be better off. 
For a recent examination of the connections between 
this index and government popularity in Britain, see 
Sanders (1991b). 


630 Handbook of LongitudindP Rega: https:/afrilibrary.com 


increased dramatically in the early months of 
the war (particularly in May and June of 1982), 
it makes sense to seek to assess how far the war 
might have contributed to the Conservatives’ 
election victory in June 1983. 

Table 36.1 reports the results derived from 
estimating the same model using each of the 
four techniques reviewed above. The data are 
monthly time series and cover the period June 
1979 to June 1987, Mrs Thatcher’s first two 
terms of office as prime minister. The mod- 
els all assume that while changes in consumer 


confidence have a near instantaneous effect on 
government popularity, changes in the objec- 
tive state of the economy only work through to 
popularity after a two-month lag. 

The simple OLS model (18a) at first sight 
appears to provide some support for the hypoth- 
esized model shown in Figure 36.5. Although 
R? is fairly low (0.54), the parameters for con- 
sumer confidence and the misery index are both 
significant and in the predicted direction. And 
although the Falklands War coefficients are not 
all significant, the war certainly appears to have 


Table 36.1 Parameter estimates for models of UK government popularity, June 1979 to June 1987 (standard 


error in parentheses) 


Independent variable (18a) (18b) (19) (20) (21) 
Simple Simple OLS Autoregressive Lagged Box-Jenkins model 
Ore ea eer a ooae endogenous « parameter 8 parameter 
variables model 
Popularity, ,/ARq) 0.91 0.83 0.89 
parameter (0.04) (0.04) (0.05) 
Consumer confidence 0.49 0.50 0.15 0.12 0.13 0.66 
(0.05) (0.05) (0.05) (0.03) (0.04) (0.19) 
Misery index, _, —2.06 —2.10 —0.30 —0.38 0.11 
(0.38) (0.39) (1.18) (0.20) (0.24) 
Falklands-May 5.81 5.68 4.37 8.63 7.98 0.93 
(3.75) (3.84) (1.61) (1.71) (1.81) (0.08) 
Falklands-June 8.10 7.95 5.17 5.54 4.32 0.51 
(3.75) (3.84) (1.62) (1.71) (1.76) (0.41) 
Falklands-July 7.14 
(3.75) 
Falklands-August 5.81 
(3.75) 
Constant 39.73 39.89 37.85 6.94 38.62 
(0.48) (0.48) (2.33) (1.73) (1.99) 
Residual square 12.89 13.77 3.24 2.68 3.05 
N 95 95 95 95 97 
Durbin-Watson 0.45 0.44 1.95 2.06 
R? 0.54 0.51 0.88 0.90 
(adjusted R?) (0.51) (0.48) (0.88) (0.90) 
Estimated 0 0 0 1.35 3.10 


contribution of 
Falklands boost to 
government 
popularity in June 
1983 
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boosted government popularity by around 3% 
in the early summer of 1982. Dropping the 
nonsignificant Falklands dummy variables for 
July and August has only a very marginal effect 
on the model (18b). However, the very low 
values of the Durbin-Watson statistics suggests 
that the models suffer from a serious first-order 
serial correlation problem, a conclusion that is 
confirmed by further diagnostic tests which we 
do not report here.** 

The three other models, (19), (20) and (21), 
in their own different ways correct for the 
problems of serially correlated error.’? These 
models have much lower residual mean square 
values (which is another way of saying that the 
R? values, where they are calculated, are much 
higher) than the simple OLS models. This is not 
surprising because (19)—(21) all include either 
y;,_; or some transformed version of it, in the 
form of u,_,, in the estimation procedure. There 
are other consistent patterns in the results for 
these three models. The coefficient of the con- 
sumer confidence variable remains significant, 
although it is much smaller (between 0.12 and 
0.15) than in the OLS models (where it is 0.49 
or 0.50). Each of the three models also produces 
a significant coefficient for the Falklands-May 
effect, again in contrast to the OLS model which 
yields a nonsignificant coefficient for this vari- 
able. 

Yet in spite of these similarities among the 
models summarized in (19)-(21), the results 
reported also indicate some important differ- 
ences. The most notable of these is the fact that 
the lagged endogenous variable method pro- 
duces a significant (and, as predicted, negative) 
coefficient for the misery index (§ = —0.38), 


“The standard tests for serially correlated error 
are available on many econometric packages. See, 
for example, Pesaran and Pesaran’s Datafit program 
(1987). 

“The lagged endogenous variable model shown in 
(20) was checked systematically for serially correlated 
error; no evidence of serial correlation was found. 


whereas the autoregressive and Box-Jenkins 
methods both yield nonsignificant coefficients 
(and in the latter case the wrong sign: B = 0.11). 
In substantive terms, there is clearly a problem 
here. If we are prepared to believe the results 
of the lagged endogenous variable model, then 
we would conclude that the “objective” state of 
the economy does exert a direct influence upon 
voters’ support for the government; yet if we 
believe the results of either the autoregressive 
or Box-Jenkins models then we would conclude 
that there was no such role for objective eco- 
nomic factors. 

A similar problem emerges when we try to 
establish what each of these models implies 
about the impact of the Falklands War on 
the outcome of the 1983 general election. The 
OLS model (18a) implies that the war had no 
effect on the subsequent election whatsoever. It 
boosted popularity substantially in June 1982, 
but its effects were already statistically non- 
significant by July of that year (see the non- 
significant coefficients for the Falklands-July 
and Falklands-August variables in (18a). Sim- 
ilarly, the autoregressive model (19) does not 
include any mechanism whereby the significant 
May and June 1982 effects may have a con- 
tinuing impact on the response variable. Both 
models (20) and (21), however, suggest that the 
Falklands effect followed the gradual-temporary 
intervention model described earlier. Popular- 
ity was boosted in May (according to (20) by 
8.6%, and according to (21) by 8.0%); was 
boosted further in June (in (20) by 5.5% and 
in (21) by 4.3%); and subsequently “decayed” 
gradually. The rate of decay, however, varies 
according to the different models. In the lagged 
endogenous variable model, the decay rate is 
given by the coefficient on y,_,. This implies 
that the May boost of 8.6% was worth 8.6 x 
0.83 = 7.2% in June; 8.6 x (0.83)? = 5.9% in 
July; 8.6 x (0.83)? = 4.9% in August; and so on. 
The June boost of 5.5% was worth 5.5 x 0.83 = 
4.6% in July; 5.5 x (0.83)? = 3.8% in August; 
and so on. Combining these two sets of effects 
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together, the Falklands War was still worth 
some 1.3% to the government in June 1983."* 
Given the relative inaccuracy of opinion poll 
data even when government popularity is mea- 
sured, as it is here, by the “poll of polls” — 
this result casts considerable doubt on claims 
(e.g., Norpoth, 1987; Clarke et al., 1990) that 
the domestic political effects of the Falklands 
campaign played a decisive role in the Conser- 
vatives’ election victory in 1983. On the con- 
trary, the importance ofthe consumer confidence 
and misery variables in the lagged endogenous 
variable model support the argument that the 
1983 election outcome was the result primar- 
ily of economic factors; that the government 
had secured sufficient economic recovery by the 
summer of 1983 to ensure its re-election (see 
Sanders et al., 1987 and Sanders, 1991a). 

Yet if (20) suggests that the medium-term 
effects of the Falklands War on government 
popularity were negligible, the results reported 
for the Box-Jenkins models (21) imply that those 
effects were a little more substantial. In the 
Box-Jenkins model, the 6 parameters explicitly 
estimate the rate of decay in their respective 
parameters. The 6 parameter for the Falklands- 
June variable is not significant (ft = 1.23) but the 
Falklands-May effect was still worth 3.1% in 
June 1983.'* The clear substantive implication 
of this result is that although a Falklands effect 
of 3% would probably not have been decisive 
in the circumstances of 1983, it would nonethe- 
less have made an important contribution to 
the size of the Conservative election victory in 


13s there were 13 months from May 1982 to June 
1983, and 12 months from June 1982 to June 1983, 
the combined effect by June 1983 is given by (8.6 x 
(0.83)13) + (5.5 x (0.83)'*) = 1.3%. 

‘Since the Falklands-May 8 parameter estimate is 
0.93, the May-boost effect by June 1983 is 7.98 x 
(0.93)!% = 3.1. Using the nonsignificant estimated 
6=0.5 to denote the decay rate of the June boost 
gives a negligible additional effect by June 1983 
(4.32 x (0.51)?? = 0.00). 


June of that year. It is more than twice the effect 
estimated by the lagged endogenous variable 
model. 

Where does this leave us? Was there a mea- 
surable (if modest) Falklands factor at the time 
of the 1983 election? Should we conclude that 
the objective state of the economy (as mea- 
sured by the misery index) had no direct effect 
on the electorate’s support for the government 
during the first two Thatcher terms? Unfortu- 
nately, as the foregoing discussion indicates, 
these questions cannot be answered indepen- 
dently of the statistical techniques that are 
employed to investigate them. We can certainly 
dispense with the conclusions suggested by the 
simple OLS model, but there is no easy way 
of resolving the disparities between the autore- 
gressive, lagged endogenous variable and Box- 
Jenkins methods. The only thing that can be 
said definitely is that in this particular case— 
and most likely in others as well—the different 
techniques produce different statistical results 
and, by implication, different substantive con- 
clusions. How, then, can we decide between the 
different techniques? 


6 Which techniques? Assessing 
relative strengths and weaknesses 


Several of the strengths and weaknesses of the 
different techniques have already been men- 
tioned. Here we summarize and qualify them. 
The simple OLS method has the enormous 
advantage of being easily understood; yet, as 
noted above, its frequent contamination by seri- 
ally correlated error often makes it problematic 
for time-series analysis. This said, from an epis- 
temological point of view, the way in which 
the simple OLS technique tests a given model 
does correspond most obviously to what most 
social scientists would regard as “testing a 
causal explanation”. Without wishing to mini- 
mize the enormous difficulties associated with 
the concept of “explanation”, we would argue 
that a causal explanation of a particular phe- 
nomenon or set of phenomena consists in 
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the specification of the minimum nontau- 
tological set of antecedent necessary and/or 
sufficient conditions required for its (their) 
occurrence. The simple OLS model, by plac- 
ing the response variable on the left-hand side, 
allows the researcher to assess how far a knowl- 
edge of the explanatory variables at time f, time 
t—1 and so on, permits accurate predictions to 
be made about the response variable at time tf. 
If extremely accurate predictions can be made, 
then it can be concluded that (at the specified 
level of abstraction) the minimum nontautolog- 
ical set of antecedent necessary and sufficient 
conditions required for the occurrence of the 
response variable has indeed been identified; in 
other words that an explanation of the response 
variable has been tested and found to be con- 
sistent with the available empirical evidence. 

Given the epistemological position of the sim- 
ple OLS approach, it comes as something of a 
surprise to discover that many practitioners of 
time-series analysis reject the use of simple OLS 
methods out of hand. Where there are strong 
linear trends in both explanatory and response 
variables this rejection is entirely justified; the 
coincidence of the trends usually suggests that 
some third, unmeasured variable (or set of vari- 
ables) is operating to produce a spurious corre- 
lation between y, and x,. 

The solution to this problem adopted by 
the autoregression, lagged endogenous vari- 
able and Box-Jenkins techniques (for shorthand 
purposes we will refer to these collectively 
as AR-LEV-BJ techniques) is to take explicit 
account of the extent to which y, can be 
predicted by its own values when any attempt 
is being made to estimate the effects of x,_; 
on y,. In the lagged endogenous variable case, 
the effects of any unmeasured variables are 
constrained to operate through y,_,; in the 
autoregressive and Box-Jenkins models the 
unmeasured variable effects are constrained 
to operate through whatever autoregressive or 
ARIMA process appears to determine y,. By 
controlling for the unmeasured influences on y, 


in this way, the autoregressive, lagged endoge- 
nous variable and Box-Jenkins methods not 
only pre-empt the problem of serially correlated 
error, but also provide for a more accurate esti- 
mation of the effect on y, that results from a 
unit change in some x;_,. 

As is so often the case, however, resolving 
one problem serves only to raise another; in 
this case, an epistemological one. By includ- 
ing the past history of y, into the estimation 
procedure, the AR, LEV and BJ methods in 
effect introduce a version of y,_, on to the 
right-hand side of the equation for y,. Yet, 
if we go back to our definition of a causal 
explanation, we see that it requires the speci- 
fication of the “minimum set of nontautologi- 
cal antecedent... conditions” necessary for the 
occurrence of the phenomenon in question. 
Since y,_, is certainly not defined indepen- 
dently of y,, it appears that the AR, LEV and 
BJ specifications build a very powerful tauto- 
logical component into the explanations of y, 
that they imply. Moreover, since tautological 
“explanations” are not explanations at all—a 
variable clearly cannot explain itself—it would 
seem to follow that attempts to explain y, based 
upon AR, LEV or BJ model-building procedures 
can never provide “real” explanations at all; 
they are merely refined vehicles for specifying 
the consequences for y, of a unit change in x. 
They never allow the analyst to conclude that 
“movements in y, can be (nontautologically) 
explained by movements in x, and the data are 
consistent with the proposition that y, is caused 
by x,”. Since this, in our view, is one of the 
main goals of empirical analysis, we believe it 
constitutes a serious limitation on the AR, LEV 
and BJ methods. 

Not surprisingly, supporters of AR, LEV and 
BJ techniques respond strongly to these criti- 
cisms. They point out that any terms for y, , 
can easily be moved across to the left-hand side 
of the equation so that the right-hand side effec- 
tively contains expressions that are nontauto- 
logically related to y,. In the case of the LEV 
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model, for example, moving y,_, from the right- 
hand side to the left-hand side of the equation 
is equivalent to using the change in y, as the 
response variable. All that is being done, in 
short, is to shift the nature of that which is to 
be explained from the level of y, to the change 
in y,. If we know the “start” value for y,, we 
can easily get back to predicting its subsequent 
levels. So far, so good. 

Yet the epistemological costs of this nota- 
tional shuffling are more serious than its pro- 
tagonists imply. This can be seen, firstly, by 
reference to R?. With simple OLS (assuming 
parameter significance and stability), and an 
absence of serially correlated error) R? is a 
singularly useful statistic: it reveals how well 
y, can be predicted purely from movements 
in (nontautological) exogenous variables. How- 
ever, with the AR and LEV models (as their 
advocates would readily admit), R? is highly 
misleading as a guide to the explanatory power 
of the nontautological influences on y, because 
it is calculated using an estimation procedure 
that explicitly incorporates the past history of 
y,. Since the AR, LEV and BJ specifications seek 
to explain only the non-self-generated variation 
in y,, what is really required is an R? equivalent 
that measures the extent to which the exoge- 
nous variables in a particular model can indeed 
predict y,. The statistic that is usually employed 
in this context is the residual mean square 
(RMS). As noted earlier, if the addition of a 
particular x,_, yields a lower RMS value than 
that obtained by knowing only y,’s past his- 
tory, then it is inferred that x,_, does affect y,. 
What RMS tests of this sort do not reveal, how- 
ever, is how accurately the non-self-generated 
variation in y, can be predicted purely from a 
knowledge of the exogenous influences upon it. 
Yet, as discussed above, the only sense in which 
AR, LEV and BJ techniques explain anything 
in conventionally understood causal terms is 
that they can account for the non-self-generated 
variation in y,. Curiously, the summary statis- 
tics usually assocated with these techniques fail 


to give any clear indication as to the extent to 
which this objective is in fact achieved. With 
AR, LEV and BJ methods, in short, not only is 
the explicandum (the non-self-generated varia- 
tion in y,) a much-reduced version of the origi- 
nal phenomenon of interest, but there is also a 
failure adequately to assess the extent to which 
that reduced explicandum is indeed nontauto- 
logically explained. 

A second epistemological cost associated 
with AR, LEV and BJ methods also derives 
from their emphasis on the need to take full 
account of the self-generated variation in y,. 
The notion that x,_, only affects y, to the extent 
that it explains variation in y, not explained 
by y,’s own past history certainly accords with 
the principle of “Granger causality”.’° Unfortu- 
nately, it also engenders a serious risk of under- 
estimating the explanatory importance of x; ,. 
In any given time-series model, it is entirely 
possible that the self-generated variation in y, 
is also capable of being explained by some 
x,_,. It is highly unlikely, however, that x,_; 
will predict y, as well as y,’s own past values 
will predict y,. This, in turn, implies that x,_; 
may appear to exert no influence on y, simply 
because it explains the same variation in y, that 
is explained by some function in y,_,. In situa- 
tions such as this, AR, LEV and BJ models are 
biased towards the underestimation of exoge- 
nous effects. 

All this discussion of self-generated varia- 
tion, however, does little to resolve one of the 
main substantive problems posed in the previ- 
ous section. If, in a specific case, serially cor- 
related error prejudices a simple OLS model, 


For an introduction to Granger causality, see Free- 
man (1983). Granger’s notion of causality can be sum- 
marized as follows: x, can be considered to Granger 
cause y, if (a) y, is influenced both by x, and by lagged 
values of x, but x, is not influenced by lagged values 
of y,, and (b) if x, explains variation in y, after all 
selfgenerated variations in y, have been taken into 
account or eliminated. 
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which of the AR, LEV and BJ class of models 
should we employ in order to analyze repeated 
aggregate cross-section data? (The choice of 
models clearly matters. For example, as we 
saw earlier, the LEV model found evidence 
of a significant “misery” effect on popularity, 
whereas the AR and BJ methods suggested no 
such effect.) There is, sadly, no easy or general 
answer to the question. The researcher must 
decide which is the more appropriate in the 
light of her or his particular theoretical con- 
cerns. This is not to imply that researchers can 
simply select the technique which best seems 
to provide empirical support for their precon- 
ceived theoretical suppositions. It does mean, 
however, that some attempt needs to be made 
to link the assumptions of the modeling tech- 
nique to the kind of model of human behavior 
that the researcher is seeking to test. 

In the context of the government popularity 
functions used here, for example, we would 
argue that the entire exercise only makes sense 
if it is possible at some stage to translate the 
parameters of a given model into the decision 
calculus of the individual elector. In our view, 
this requirement renders the LEV method the 
most appropriate of the class of AR, LEV and 
BJ techniques for analyzing government popu- 
larity data. In contrast with the AR and BJ tech- 
niques, all of the terms specified in the LEV 
model are directly measured, so that it can be 
plausibly assumed that the typical elector is 
in some sense aware of them. The inclusion 
of the lagged endogenous variable itself can be 
interpreted as denoting the elector’s predisposi- 
tion to support the government; the exogenous 
variables denote, obviously, the hypothetical 
economic and political influences on govern- 
ment popularity. 

With the AR and BJ techniques, in con- 
trast, the translation is much more difficult. 
Although the same sort of interpretation can 
be made of the exogenous variables as in the 
LEV case, the substantive meaning of u,_, in 
the AR model and of the ARIMA process that 


is self-generating y, in the BJ model—both 
by definition phenomena that are not directly 
observable—is generally far from clear. In these 
circumstances, it is often difficult to envisage 
how the coefficients on some of the terms in 
AR and BJ models translate into what might 
conceivably go on inside electors’ heads. 

It is primarily for this reason that we would 
conclude that the lagged endogenous variable 
technique is probably the most appropriate for 
analyzing the sort of data we have described 
in this chapter. This, in turn, leads us to con- 
clude that, of the various results presented in 
Table 36.1, the findings reported in (20) are 
probably the most useful for evaluating the the- 
oretical model that was proposed in Figure 36.5. 
This suggests (a) that both consumer confidence 
and the objective state of the economy (which 
may themselves be interrelated)'® exert direct 
effects on the level of electoral support for the 
government; and (b) that the Falklands War was 
worth just over 1% to the government’s popu- 
larity by the time of the 1983 election. This is 
not to imply, however, that the LEV technique 
is always the best vehicle for handling time- 
series data. Where the nature of the substantive 
problem under investigation means that it is 
unnecessary to translate the parameters of the 
statistical model into some kind of individual- 
level decision calculus, it may well be more 
appropriate to obtain as accurate a definition of 
the error process as possible; and in these cir- 
cumstances, AR or BJ methods would probably 
be more suitable than the LEV technique as it 
has been outlined here. 


7 Conclusion 


In this chapter we have reviewed four 
different—though _related—techniques for 
time-series analysis. We have also attempted to 
articulate our doubts, not about the statistical 


16For a discussion of these connections, see Sanders 
et al. (1987). 
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soundness of AR, LEV and BJ techniques, but 
about their epistemological implications; about 
the limitations on their ability to evaluate 
“explanations” as they are conventionally 
understood. We certainly do not claim to 
have provided a definitive analysis of the 
epistemological difficulties encountered with 
these techniques, merely to have articulated 
some genuine sources of concern which, in our 
view, all practitioners of time-series analysis 
should at least consider. 

Time-series analysis with political and social 
data is necessarily a highly judgemental 
process. Apart from the continuing need to 
avoid models that exhibit serially correlated 
error, there are few hard and fast rules that 
must be followed in all circumstances. Indeed, 
any method that seeks to impose a strict set of 
rules to be followed will tend to founder on the 
need for constant interplay between the ana- 
lyst’s theoretical ideas and the way in which 
they use particular statistical techniques. Time- 
series analysis with political and social data is 
not the hard “science” of the econometricians. 
It involves the evaluation of causal propositions 
by reference to concepts that are imperfectly 
measured and techniques that are rarely alto- 
gether appropriate for the task. It is, in essence, 
art with numbers. 


Appendix: effects of differencing 
on two hypothetical time series 


Detrending by differencing can lead to sub- 
stantively spurious conclusions if the trends 
in endogenous and exogenous variables are 
causally related. A familiar paradox is dis- 
played in Figures 36.6 to 36.8. The (hypothet- 
ical) y, and x, variables are clearly negatively 
related over the long term (each reaches its max- 
imum/minimum at the same point). Yet if both 
y, and x, are differenced to render each series 
stationary (as in Figure 36.8), the transformed, 
second-differenced variables appear to be posi- 
tively related, even though common sense sug- 
gests that such an inference is inappropriate. 
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Figure 36.6 Hypothetical y, and x; over time 


Figure 36.7 Changes in y, and x, (first 
differences) over time 
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Figure 36.8 Changes in Vy, and Vx, (second 
differences) over time 
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In circumstances such as this, an insistence that 
all variables must be rendered stationary can 
produce misleading conclusions. At worst—in 
situations where there is a considerable amount 
of measurement error in both y, and x, and 
where the measured variables therefore only 
track the broad trends in the phenomena under 
investigation—the removal of trends through 
differencing can lead to a statistical analysis 
based almost exclusively on the correlation of 
measurement error. 
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| Chapter 37 i 


Differential equation models 
for longitudinal data 
Steven M. Boker 


One reason that longitudinal data are collected 
is to understand dynamic processes that might 
give rise to time-dependent relationships in 
these measured variables. When intensive lon- 
gitudinal designs have been employed, con- 
sisting of more than thirty or so occasions 
of measurement, continuous time differential 
equations may be good candidates for empiri- 
cally testing hypotheses about dynamic pro- 
cesses. This chapter introduces the reasoning 
behind the use of continuous time differen- 
tial equations models, explores a few of the 
simple popular forms of differential equations 
that have been used in the behavioral sci- 
ences, and provides an introduction to one of 
the methods that is currently in use for fit- 
ting differential equations models to intensive 
longitudinal data. 

Individuals change over time. This change 
may be minute to minute fluctuation in vari- 
ables like mood or anxiety, longer-term change 
in variables like attitudes, or very long-term 
developmental change in variables like mea- 
sures of cognitive abilities. If an individual is 
changing in such a way that his current state 
influences his future state, then one may find 
that differential equations compactly describe 
hypotheses about the process by which this 
time-dependence can lead to a variety of indi- 
vidual trajectories of change. That is to say, we 


do not expect that everyone’s pattern of change 
will look alike. However, the same lawful inter- 
nal process may lead to what appear to be quali- 
tatively different patterns of change for different 
individuals. 

There are several ways that observed indivi- 
dual differences in intraindividual change tra- 
jectories may occur. Let us consider three of 
them. First, there may be essentially random 
exogenous influences that may also influence 
the future state, so the future state may only be 
partially dependent on the current state. Sec- 
ond, there may be quantitative individual dif- 
ferences in the way that the current state leads 
to the future state for each individual. Third, 
an individual may adapt the way that his cur- 
rent state leads to his future state in response 
to some change in the environment. Differential 
equations modeling allows one to succinctly 
formalize each of these ways in which interindi- 
vidual differences in process may lead to dif- 
ferences in intraindividual change. 

Intraindividual change may be referenced as 
being in relation to an equilibrium set. That 
is, the internal process leading to intraindivi- 
dual change is somehow organized around the 
equilibrium set. For instance, a simple equili- 
brium set could be a single personal goal. One 
might observe behavior of an individual that 
regulated itself so as to bring the individual 
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closer to this personal goal. Another equilib- 
rium set might be a homeostatic set point. One 
might measure a behavioral variable that fluctu- 
ated around the set point, while not leaving its 
immediate neighborhood of values. But an equi- 
librium set does not need to be a single value; 
an equilibrium set may be a cycle, such as the 
circadian wake-sleep cycle. Or, an equilibrium 
set might itself be undergoing developmental 
change such as when crawling turns to walking 
in toddlers. 

Differential equations models allow one to 
formalize the relationship between observed 
short-term patterns of individual change from 
one observation to the next and the overall pat- 
tern of changes that could have been observed 
if the individual had been observed starting in 
any particular state. That is to say, differential 
equations models infer the overall organization 
of the intraindividual change with respect to 
an equilibrium set from the observed sample of 
short-term intraindividual changes. 

In order to make the preceding ideas more 
concrete, let us examine three simple linear 
differential equations. A continuous time dif- 
ferential equation is a model for instantaneous 
change. The term instantaneous change as it is 
used here does not imply that a person changes 
some discrete amount, say 10 points on an abili- 
ties measure, from one moment to the next. 
Instead, it is saying that there is a relationship 
between the value of change and the interval of 
time over which the change happens, and that 
this ratio exits for every moment in time during 
the interval. This ratio is expressed as a deriva- 
tive with respect to time. For instance, the first 
derivative with respect to time of a variable x 
at time f can be expressed as dx(t)/dt which we 
will write in shorthand as x(t). This simply says 
that at some chosen instant of time t the slope 
of x exists and we have an estimate of it, x(t). 
Similarly, we can talk about the change in that 
slope. At the chosen time tf, the change in the 
first derivative would be d(dx(t)/dt)/dt which 
is shortened to d*x(t)/dt? and which we will 


write in shorthand as x(t). Thus, at a chosen 
time t there exists a curvature for the variable 
x and we have an estimate of it, X. 

Consider a single outcome variable x that is 
a linear function of time, so that 


x(t) = by + byt (1) 


where by, is the intercept and b, is the slope. Be 
sure to note that b, has units associated with 
it: change in x per unit change in time. For 
instance, if x was your car and you were tra- 
veling at a constant velocity while being pho- 
tographed from the air, the value for b, might 
be in units of miles per hour, or meters per 
minute. Without the units for b,, it does not 
have a meaning that can be interpreted from 
one experiment to the next. This is an impor- 
tant point that relates back to the definition of 
the first derivative in the previous paragraph: 
in experimental data, derivatives always have 
units associated with them (such as miles per 
hour, meters per second). Taking the differ- 
ential of equation 1 with respect to time, we 
find that 


x(t) =b, (2) 


This is just as we would expect, the slope is 
b, no matter what time t we select. So, here is 
our first example differential equation. We are 
predicting the change in x and it is constant for 
any time ft. 

But where is b, in equation 2? The inter- 
cept b, is the value of x when t= 0. We are 
no longer predicting x, so there is no need for 
by; it drops out during the differentiation with 
respect to time. This is one major difference 
between using a differential equation and using 
the integral form as shown in equation 1, which 
explicitly predicts a value for the variable x. 
Differential equations do not specify what hap- 
pens at time t= 0. This can be a substantial 
benefit when one is not interested in how x 
changes with respect to time t = 0, but rather 
is interested in how x might change given any 
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particular value of x at an arbitrarily assigned 
value of t=0. 

In integral form models for change, it is 
important to be able to assign a time t = 0 and 
that this value for time is the same for everyone. 
Sometimes this is possible, for instance when 
one is interested in growth curves of children’s 
height. But in other circumstances, for instance 
a daily diary study of mood, one is hard pressed 
to say that the first occasion of measurement 
in the study actually has the same meaning for 
everyone. One person might start the study the 
day after winning the lottery and another the 
day after a car accident. Why would the experi- 
menter wish to equate the meaning of time t = 
O for these two individuals? The differential 
equations model approach sidesteps this whole 
issue since it expresses a model that says if a 
person is in some particular state at a selected 
time t we predict that person will be chang- 
ing at some instantaneous rate. By adding up 
these changes over an interval of time (integrat- 
ing over an interval of time) we can predict an 
expected trajectory for that person. Thus, the 
differential equation model does not specify a 
particular trajectory. Rather, one may think of 
differential equations models as specifying a 
family of trajectories; each of which has a start- 
ing point, a set of states called initial conditions 
at some selected time t = 0. 

Next, consider a bit more complicated model 
for change. Suppose that the slope of x is a 
function of the value of x so that 


x(t) = b, x(t) (3) 


This model for change says that the slope of 
x is proportional to the value of x. If b, <0, 
then this model suggests that if we knew that 
x at some chosen time ¢t was a positive value, 
then the slope would be a negative value. Thus 
as time progressed, x would approach 0. Simi- 
larly if x(t) were negative, the slope would be 
positive and so as time progressed x would 
approach 0. Again, b, has units associated with 
it, for instance if x is a distance b, it might be 


expressed in meters per second. More to the 
point for behavioral variables, if x is in popu- 
lation standard deviation units and the experi- 
ment were a daily diary study, b, might be in 
units of standard deviations per day. 

When the coefficient b, is negative in equa- 
tion 3 the trajectory that is produced is a nega- 
tive exponential. To see this, we can integrate 
equation 3 to find that 


x(t) = bye™ (4) 


Again, we find that we have an intercept term 
that enters into the picture, but this time as 
a scaling of the negative exponential function 
of time. 

The expression of x as a nonlinear function 
of time in equation 4 may seem unfamiliar to 
many behavioral scientists. However, it is actu- 
ally widely used in its discrete form without 
people realizing that their model implies a non- 
linear function of time. Consider the following 
discrete time forward prediction model where 


x(t+ Af) = c, x(t) (5) 


where At is the interval between occasions of 
measurement. This seemingly linear autoregres- 
sive model has a nonlinear component that is 
hidden in it. 

To see this, consider the graphs in 
Figure 37.1(a), (b), and (c). These graphs appear 
identical to one another. Figure 37.1(a) was cre- 
ated by the differential equation in equation 3 
choosing x(0) = 4 and b, = —0.05. Figure 37.1(b) 
was created by the differential equation in equa- 
tion 4 choosing b, = 4 and b, = —0.05. Finally, 
the discrete measurements in Figure 37.1(b) 
were created by setting x(0) =4,At=1, and 
c, = e490, Thus, the autoregressive para- 
meter c, is a nonlinear function of the interval 
of time between successive measurements. The 
parameters of the differential and integral form 
are independent of the measurement interval. 

In order to plot a single person’s trajectory 
over time, we needed to choose an initial value 
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Figure 37.1 Negative exponential decay function as specified by (a) the differential equation in equation 3, 
(b) the integral form in equation 4, and (c) the discrete time form of autoregression in equation 5 


for x at time O. Thus, the differential form, the 
integral form, and the discrete form of these 
equations are equivalent when an initial value 
is known. If there are individual differences 
in initial values, the integral form and discrete 
form must be estimated as a random coefficients 
(i.e., mixed effects) model in order for the mod- 
els to be equivalent. In this case the integral 
form or discrete form have the advantage of giv- 
ing estimates of the interindividual differences 
in initial values, if that is important to the ques- 
tion at hand. If individual differences in initial 
values are unrelated to the research question, 
then the differential form simplifies the statis- 
tical model to be fit. 


1 Second order equations 


So far, we have considered only the simplest 
first order linear differential equations, equa- 
tions that only involve the first derivative of 
a variable. Second order linear equations can 


exhibit many types of behavior, including expo- 
nential decline or increase, increase and then 
exponential decline, and oscillations as shown 
in Figure 37.2. 

A second order linear differential equation in 
one variable may be written as 


X(t) = b,x(t) +b, X(t) (6) 


where X(t) is the second derivative of x with 
respect to time. This equation expresses the 
expected simultaneous relationships between 
the time derivatives of x at one moment in 
time t. If both b,<0 and b,<0 this system is 
called a damped linear oscillator. If b,<0 and 
x(t)>0 the effect on the second derivative is 
negative, i.e., the slope is becoming more neg- 
ative. Thus, the farther x is from 0 in a pos- 
itive direction, the more the slope tends to 
become negative. The effect over time is that 
x is driven back towards the equilibrium value 
0. Similarly, if b,<0 and x(t)<0, the effect on 
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Figure 37.2 The same second order differential equation (equation 6) can produce (a) increase followed by 
decrease, (b) exponential decrease, and (c) oscillations depending on its parameters and initial values 
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the second derivative is positive and so the 
slope is becoming more positive and thus the 
result is that x is again driven back towards 
equilibrium. This oscillation would continue 
indefinitely except that if b,<0 the greater the 
slope, the more the slope changes to be close to 
zero. When b,<0 the system is said to damp to 
equilibrium. 

There are many other forms of differential 
equations: higher order equations with third or 
fourth derivatives, equations that are nonlinear 
in their variables, equations that are nonlinear 
in their parameters, or equations that have time- 
dependent parameters (the interested reader is 
referred to Ellis, Johnson, Lodi and Schwalbe, 
1992; Hubbard and West, 1991, 1995; Kaplan 
and Glass, 1995; Thompson and Stewart, 1986; 
Wylie, 1979). Furthermore, any of the pre- 
ceding types of equations may be combined 
into coupled systems of equations: simultane- 
ous equations whose time-dependent trajecto- 
ries are interdependent on one another. 


2 Some methods for estimating 
parameters 


Once a model has been formulated in order 
to express the relationship between derivatives 
of a variable, the model can be fit to repeated 
observations data. There are a variety of tech- 
niques for estimating parameters of differential 
equations. These techniques fall into two main 
categories, integral forms and differential forms. 

Differential equations estimation has a long 
history, tracing back to Hotelling (1927) who 
posed the problem of differential equations sub- 
ject to error. The stochastic integral was intro- 
duced by It6 (1951) and this method has been 
furthered by several researchers (Bergstrom, 
1966; Arminger, 1986) and applied in an exact 
discrete (Singer, 1993) and approximate dis- 
crete (Oud and Jansen, 2000) form to repeated 
observations data. These models have been 
shown to improve on cross-lag panel models by 
using the continuous time form of the first order 


differential equation equivalent (Oud, 2007). 
These methods use the integral form of the dif- 
ferential equation, so either an analytic integral 
or numerical approximation must be derived 
prior to fitting the model. 

Another technique for fitting differential 
equation models in the integral form to time- 
series data involves using Kalman Filters 
(Kalman, 1960) or Extended Kalman Filters to 
fit a state-space linearization of the differen- 
tial equation in question (for details see Harvey, 
1989; Chatfield, 2004). These techniques show 
considerable promise for the flexible estima- 
tion of parameters of linear and nonlinear sys- 
tems with time-varying parameters and have 
begun to be used in the behavioral sciences 
(Chow, 2006). 

One discrete method that has been proposed 
casts the differential equation in terms of a 
latent difference score equation (Hamagami and 
McArdle, 2007; McArdle, 2000). This method 
uses latent variables to calculate difference 
scores in order to predict first order change. 
This method is a latent variable extension to 
first order autoregressive time-series methods 
that make linear predictions forward in time in 
a manner similar to equation 5. This method 
has the advantage of ease of specification and 
its latent difference score is easy to under- 
stand. However, its parameters are a nonlinear 
function of the differential equation parameters 
and the lag between measurements (recall that 
c, = eb), 

One differential form estimation two-step 
procedure was proposed by Boker and Graham 
(1998) and Boker and Nesselroade (2002) who 
used simplified local linear approximation to 
generate estimated derivatives and then fit 
the differential form of the model directly 
to the resulting derivatives. This method has 
been applied to coupled systems (Boker, 2001; 
Butner, Amazeen and Mulvey, 2005) and has 
been applied in a multilevel coupled context 
where individual differences in parameters are 
predicted by second level variables (Boker and 
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Laurenceau, 2005; Maxwell and Boker, 2007). 
While this method has an advantage of sim- 
plicity, its parameters are prone to bias if non- 
optimal time delays are not available (Boker and 
Nesselroade, 2002). 

The remainder of the chapter focuses on 
another method for fitting the differential form 
of a differential equation to repeated observa- 
tions: latent differential equations. 


3 Latent differential equations 


Latent differential equations (LDEs) use struc- 
tural equation modeling to simultaneously esti- 
mate latent derivatives of a time series and fit 
a structural model to the covariances between 
those derivatives (Boker, Neale and Rausch, 
2004). This method estimates measurement 
error, which does not affect the trajectory of 
the system over time, and dynamic error, which 
does affect the system’s trajectory. The method 
is less prone to parameter bias over a range of 
time delays between occasions of measurement 
than is the local linear approximation method 
referred to previously. 

LDE uses a form of Savitzky-Golay filtering 
(Savitzky and Golay, 1964) to construct a con- 
strained loading matrix similar to that ofa latent 
growth curve structural model. The data are 
time series that have been put into a time-delay 
embedded (i.e. state space) matrix with five 
time delay lags on every row of the matrix. The 
method for constructing this matrix is described 
in detail in the next section. The covariance 
structure between the latent variables of this 
model can be used to specify a differential equa- 
tion model as in, for instance, equations 3 or 6. 


3.1 Time-delay embedding 


Most methods that estimate differential equa- 
tions models use some form of state space embed- 
ding. Time-delay embedding constructs a form 
of state space in which the data columns are 
time lagged. The simplest form of a time-delay 
embedded matrix is a pre-post design where 


the first column of the matrix is the obser- 
vation before an intervention and the second 
column is the same variable measured some 
time At after the first observation, and presum- 
ably after the intervention. Each row in this 
matrix is a different individual, and the assump- 
tion is made that the individual differences in 
the change between column 1 and column 2 
provide a measure of the process of interest. 

Suppose there are five equal interval repeated 
observations per person. The data may again 
be arranged so that each row contains observa- 
tions from one individual and the time delay 
between columns is again At. Each row of the 
resulting matrix thus represents a total interval 
of time equal to the number of columns minus 
one times the delay between columns, in our 
case (5—1)At. Again, the assumption is made 
that the process evolving during the time that 
elapses in the interval between the occasions of 
measurement in the first column and the last 
column in the row will manifest itself in the 
relationship between the data in the 5 columns. 
It is assumed that to the degree that individuals 
in the sample are representative of the popu- 
lation, statistics calculated using the relation- 
ships between columns will be representative 
of the process of interest. Under assumptions 
of ergodicity, we can use a time-delay embed- 
ding to estimate the parameters of differential 
equation models (for details see Noakes, 1991; 
Sauer, Yorke and Casdagli, 1991; Takens, 1985; 
Whitney, 1936). 

Now, suppose that we have 100 observations 
per person. If the relationships between the 
5 columns in the previous matrix were suf- 
ficient to capture the time evolution of the 
process, we do not need more columns in our 
data matrix. But since we have 100 observations 
per person we wish to use these data effectively. 
If the data for person 1 on occasions 1 through 
5 is {x(1,1),x(1,2),x(1,3),x(1,4),x(1,5)}, then 
we could consider that to be one observation 
of the time evolution of the process of inte- 
rest. Another observation from the same person 
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would be {x(1, 2), x(1,3), x(1,4),x(1,5), x(1,6)}. 
In this way we could march through the first 
person’s data constructing 100 — 4 = 96 observa- 
tions of the time evolution of the process. Thus 
we can add 96 rows to our time-delay matrix 
for each 100 observations per person. 

But what if the total delay in each row (4At) 
is not long enough to capture the change in 
which we are interested? It might be that the 
process evolves relatively slowly and a longer 
delay is necessary to observe the change. The 
Takens (1985) embedding theorem says that the 
time-delay embedding technique is sufficient 
if enough columns are chosen and the time 
between columns is not poorly chosen. In prac- 
tice, there will be a time delay between the 
first and last column that will work best, max- 
imizing the ratio of the reliable change over 
the total change. When we have many measure- 
ments, such as the 100 measurements postu- 
lated above, we can vary the interval between 
columns by selecting an occasion indexing 
parameter 7 that gives the number of occasions 
to lag between each column. In the example 
in the previous paragraph we used 7 = 1. In 
general the system is unknown, and so 7 is 
frequently selected over a range of values and 
models fit to embedded data from each selected 
value of 7 so as to test the stability of parameter 
estimates. 

Consider a univariate time series X where 
individuals i=1...N are observed on occa- 
sions j =1... P separated by a fixed time inter- 
val At. We will create a d=5 dimensional 
embedding (5 lagged data columns) such that 
the time delay between embedded columns is 
TAt where 7 = 2. The embedding delay will thus 
be twice the interval between occasions of mea- 
surement. If the original time series X is ordered 
by occasion j within individual i then the series 
of x;;;) can be written as a vector of scores 


xX ={Xa1)- -» Xap) 


Xaa)e++ XQ,pe++ ways Xnpt (7) 


The five dimensional (d = 5) embedding X°? 
where 7 = 2 can then be written as a matrix with 
five columns such that 


Xa) *a3) Xs) *%a7)  *a9) 
Xa2) Xaa Xae) Xa) X10) 


X(2,P—8) X(2,P—6) *X(2,P—4) *(2,P—2) *(2,P) 
Xi.1)  -X(a,3) X25) X(a,7) — X2,9) 
X20) Xa Xe) X28) X(2,10) 


Xo = 


X(2,P—8) X(2,p—6) *X(2,P—4) *(2,P—2) *(2,P) 


Xinay Xa) Xs) X(N7)_— X(N9) 
Xn2) Xia) (6) = X(N8)~— X(.N,10) 


X(n,p—8) *(N,P—6) *(N,P—4) *(N,P—2) *X(N,P) 


(8) 


Each row of X®) is a short within-person 
sequence of observations where an interval of 
7At separates each column. Each individual’s 
data are used fully as long as there are at least 
P > 10 occasions of measurement for each indi- 
vidual. Finally, note that the data from indi- 
vidual i never appears on the same row with 
individual i+1; thus data from one person’s 
process does not overlap with data from another 
individual. 

In order to construct a time-delay embed- 
ding we must determine how many columns 
are in the embedding. For use with an LDE 
model, one will frequently use between d = 4 
and d=6 columns. Four columns is the mini- 
mum number of columns that is still identified 
for a second order differential equation model. 
Five columns has, in simulations, proved to be 
somewhat more stable in the case of a second 
order linear differential equation, and allows 
the estimation of models other than the simplest 
second order model. More than d = 6 columns 
can be used, but in any model that includes 
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the possibility of oscillation one must take care 
that the elapsed time between the first column 
and the last column in the embedding does not 
exceed one-half the elapsed time between peaks 
of the oscillation. Thus, drAt must be less than 
one-half the period of the oscillation if the LDE 
method is to be applied to oscillating data. 


3.2. First order LDE 


In the next sections we will fit linear first 
and second order differential equations as were 
specified earlier. But now we will add a resi- 
dual term so that for the first order linear 
differential equation we have 


x, = b,x,+e, (9) 


where X; is the first derivative of x with respect 
to time at time ¢t, x, is the displacement of the 
variable x from its equilibrium value at time f, 
and e, is an approximately normally distributed 
independent residual. We can use a slope and 
intercept latent growth model with fixed load- 
ings to estimate the parameter of interest, b,, 
from a five-dimensional time-delay embedded 
data matrix as shown in the path diagram in 
Figure 37.3. 


x1 x2 x3 x4 x5 
ef 3 : : : : 
Figure 37.3 Path diagram of a first order linear 
differential equation specified as an LDE model 


with five indicators (i.e., a five-dimensional 
time-delay embedding of the time-series data 


The loading matrix L will, in this case, be 
of order 5 x 2 and is specified so that the first 
column is a column of 1’s and the second col- 
umn is a linear basis function scaled by the 
interval between the columns, rAt, and cen- 
tered on the middle column of the time-delay 
embedded matrix. 


1 —27At 
1 —rAt 
L=]/1 0 (10) 
1 TAt 
1 27TAt 


In this way, the latent variables x and 
x in Figure 37.3 will be the intercept and 
slope centered around the observation x3, the 
third column of the five-dimensional time- 
delay embedded data matrix. The structural 
part of the model can be specified using RAM 
matrices (McArdle and McDonald, 1984) as 


0 O 

a=|> | (11) 
Vv, 0 

2/7 7 (12) 


If we now define a 5 x 5 diagonal matrix E to 
contain the five residual variances for the five 
indicators (V1, Vex2, Vex3> Vexar Vexs), We can 
calculate the expected covariance matrix R for 


this model as 
R=L(-—A)?S(I1— A) 1L' +E (13) 


The estimate for b, corresponds to the para- 
meter in equation 9 while the estimate for V,, 
corresponds to the variance of the residual e,. 
Note that the residual variances in E are the 
portion of the variance of the indicators that 
does not conform to the latent model where 
only an intercept and slope are used to account 
for the time dependence in each row of the 
time-delay embedded data matrix. The model 
may be misspecified if higher order derivatives 
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are required to account for this time depen- 
dence. The variance V,; estimates the portion 
of the variance of the slope that cannot be 
accounted for by the displacement from equi- 
librium, i.e., the intercept. This residual pro- 
vides an estimate of the slope variance that 
does not conform to a linear first order differen- 
tial equation. To the extent that V,, is large, it 
may be that there are exogenous influences with 
effects that propagate linearly over time. Or it 
may be that the linear relationship between x 
and x is insufficient to capture the relationship 
between displacement from equilibrium and its 
first derivative. 


3.3. Second order LDE 


In this section we fit a linear second order dif- 
ferential equation specified as 


X, = 9X, + CX, + e; (14) 


where x, and X, are the first and second deriva- 
tives respectively of x with respect to time at a 
particular time t, x, is the displacement of the 
variable x from its equilibrium value at time f, 
and e, is an approximately normally distributed 
independent residual. The coefficients 7 and ¢ 
are frequency and damping coefficients respec- 
tively of the damped linear oscillator formed 
when 7 <0. This model is one of the sim- 
plest methods for accounting for selfregulating 
systems that have a stable equilibrium or set 
point. The farther the system is from equilib- 
rium (when x, is large), the more the system 
curves (X, becomes large and of opposite sign 
to x,) back towards equilibrium. The larger a 
negative number is 7, the greater this effect. One 
might consider this as the system attempting 
to avoid being far from equilibrium: the farther 
it is from equilibrium, the more it curves and 
goes back. Similarly, one can think of ¢ < 0 as 
a coefficient controlling the avoidance of rapid 
change: the faster the system is changing, the 
more it decelerates. 

Suppose we have created a five-dimensional 
time-delay embedding matrix X® as described 


above. Each of the column vectors of this 
matrix is a manifest variable indicator of the 
second order differential equation shown in 
Figure 37.4. The latent variables in this model 
are estimates of the displacement, first deriva- 
tive, and second derivative of each row of X®). 
A three-factor confirmatory model with five 
indicators is not identified when the factor 
loadings are allowed to be free. In the case of the 
LDE model, the loading matrix L is constrained 
so as to estimate latent derivatives. 

The LDE loading matrix L is a number of 
indicators by number of latent variables matrix 
whose values are constrained as follows. The 
first column of the matrix is fixed to be equal 
to one. The second column is fixed to be a unit 
basis function for a slope scaled by the interval 
between columns and with intercept at the mid- 
dle indicator. The third column is the indefinite 
integral of the second column; i.e., the third col- 
umn is the second column squared and divided 
by two. Higher order derivatives can be calcu- 
lated in the same way. Each successive column 
in L will be the indefinite integral of the previ- 
ous column. Thus in the case of a second order 
LDE model of a five-dimensional embedding, 
we can write L as 


g 
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Figure 37.4 Path diagram of a second order latent 
differential equation model 


x3 | | x4 5 
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—27TAt (—2r7At)*/2 
—TAt (—rAt)?/2 
0 0 (15) 
TAt (7At)?/2 
27TAt  =(2rAt)?/2 


cml 
ll 
BREE EB 


where At is the elapsed time between succes- 
sive occasions of measurement and 7 is the 
number of occasions of measurement separat- 
ing each column of the time-delay embedded 
matrix. 

The covariances between the displacement 
x, the latent first derivative x and the latent 
second deriviative X are used to estimate the 
parameters of a second order differential equa- 
tions model in equation 14. The regression coef- 
ficients 7 and ¢ appear in a 3 x 3 matrix A and 
the free variances and covariances of the latent 
variables appear in a matrix S. 


0 O O 

A=]0 0 0O (16) 
"7 < 0 
Vy. Cys 0 

S=|C,, Vz, 0 (17) 
0 0 «CY, 


Again we define a 5x5 diagonal matrix E 
as in the previous section and estimate the 
expected covariance matrix R for this model as 


R=L(-—A)'S(I- A)? L'+E (18) 


The five columns of X“) gives us 15 degrees of 
freedom in the data covariances. There are five 
degrees of freedom used by E, two degrees of 
freedom used by A and five degrees of freedom 
used by S, leaving four degrees of freedom with 
which to test the fit of the model. 


4 Multivariate second order LDE 


The LDE model specified in the previous 
section was only indicated by a single variable, 


although its time structure was converted into 
a multivariate form by using time-delay embed- 
ding. A construct indicated by multiple mani- 
fest variables may also be time-delay embedded 
so as to create a data matrix that has multi- 
variate indicators each of which is multivariate 
across time. This form of time-delay embedding 
produces a better estimate of the dynamics of a 
latent construct since the differential equation 
coefficients are estimated using only the com- 
mon variance that is time structured. 

Constructing a multivariate time-delay 
embedding is a straightforward extension of 
the time-delay embedding procedure described 
above. Suppose we have three time series 
X,Y, and Z where individuals i=1...N are 
observed on occasions j = 1... P separated by 
a fixed time interval s. Again we will create 
a d=5 dimensional embedding such that the 
time delay between embedded columns is 7s 
where 7 = 2, thus the embedding delay is twice 
the interval between occasions of measure- 
ment. If the original time series X,Y, and Z 
are ordered by occasion j within individual /, 
then these series can be written as vectors of 
scores 


X ={X ays ++ Xap XQ1)9+ + XQpyrees 
Xayoe -Xiv.py} 
Y ={Yar)> ++ Yap Year+ Vp +> 
Yinayes+ Yup 
Z={Zaaye- ++ Zaps Zeaayrr ++ Zep 
Znayr++ + Zp} (19) 


We first construct three embedding matrices 
X®,Y®, and Z® as shown previously in 
equation 8. We then augment these three matri- 
ces together (X®|Y©|Z©) so that the rows 
align. For example, when 7 = 2, the aug- 
mented time-delay embedded matrix would 
take the form 
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Xaay 0° Xa) Yaa 7 Vas) 4a.) + 2,9) 
Xaa,2) °° Xo) Yaz) °°* Vato) 41,2) °° 24,10) 
X(2,P—8) *** X(2,P) Y(2,P—8) °** V(2,P) 2(2,P—8) *** 2(2,P) 
Xa) °° Xa9) Vara ° Vag) Zar) +° 42,9) 
X(a,2) °°" X(2,10) Y(2,2) °° V(2j10) (2,2) *** (2,10) 
w® = 8 eB d a 
X(2,p—8) “°° X(2,P) Y(2,P—8) °** V(2,P) 2(2,P—8) °** 2(2,P) 
X(nj1) 17" Xing) Vinay 7 Vong) 4Na) 7" 20,9) 
X(nj2) *** X(n10) Yon,2) *** Vino) 2(N,2) *** 2(N,10) 
X(N,P—8) °°" X(N,P) V(N,P—8) °°° V(NP) 2(N,P-8) °° Z(N,P) 
(20) 


Thus, the first five columns of W® are the 
time-delay embedded values for the variable x, 
the next five columns for y, and the last five 
columns for z. This data matrix can now be fit 
by the multivariate LDE models, such as the sec- 
ond order linear model shown in Figure 37.5. 
Since the multivariate LDE model is estimat- 
ing both the within-time factor structure of the 
latent construct F as well as the time-delayed 
structure as derivatives, loading matrix will no 
longer be entirely constrained to fixed values. 
The first five rows of L are the same as in 


the univariate case: all values are fixed since 
both At and 7 are known in advance. The sixth 
through tenth rows of L are simply a copy of the 
first five rows scaled by the free coefficient a. 
Similarly, the eleventh through fifteenth rows 
are scaled by the free coefficient b resulting in 


1 -2rAt (—2rAt)?/2 
1 —rTAt (—rTAt)?/2 
1 0 0 
1 TAt (rAt)?/2 
1 2rAt — (2A)? /2 
a —2arTAt a(—2rAt)?/2 
a —atAt  a(—rAt)?/2 
L=|a 0 0 (21) 
a atAt a(rAt)?/2 
a  2arAt a(2rAt)?/2 
b —2brAt b(—2rAt)?/2 
b  —brAt  b(—rAt)?/2 
b 0 0 
b brAt b(rAt)?/2 
b 2brAt ~~ —b(2rAt)?/2 


Figure 37.5 Path diagram of a multivariate second order latent differential equation model with three 


indicators and a five-dimensional time-delay embedding 
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When L is constructed in this manner, a and 
b will be factor loadings for the construct such 
that the factor structure is invariant over time 
and over the derivatives of F. We call this differ- 
ential factor invariance. Differential invariance 
is expected to hold when the differential equa- 
tions model is linear in its coefficients and vari- 
ables. This hypothesis can be tested by releasing 
the constraint that the factor loadings a and b 
be equal across the columns of L. 

The structural part of this multivariate sec- 
ond order linear LDE is exactly the same as the 
univariate case from equations 16 and 17 shown 
above, although we have now labeled our con- 
struct as F. 


0 O O 
A=|]0 0 0 (22) 
n ¢ O 
Ve Crp 0) 
S=| Crp Ve 0 (23) 
0 0 Vi 


Again, the predicted covariance of the multi- 
variate model can be calculated as shown in 
equation 18. 


5 LDE model extensions 


The latent structure of LDE models is flexi- 
ble with respect to how the differential equa- 
tion or equations are specified. For instance, a 
fourth order model can be created by adding 
two extra columns to the time-delay embed- 
ded data matrix, and setting up the L matrix to 
have 7 rows and 5 columns (Boker, 2007). Or 
coupled differential equations can be created 
by augmenting two time-delay embedded matri- 
ces together and setting up a structural model 
where two latent variables are intrinsically reg- 
ulated as well as bidirectionally coupled: 


X(D) = 1, X() + OX + Vy (My VO 
+ Gy) +e) (24) 


VO =nVO+GIO + ¥x(MX() 
+ 2, x(£)) + ey (0) (25) 


The second derivative of y and x are regulated 
both by their own intrinsic process, but also 
as a linear proportion of the other variable’s 
regulation. Systems like these have been used to 
model dyadic relationships in married couples 
(Boker and Laurenceau, in press). 


6 Limitations and recommendations 


There are limitations to the methods described 
in this chapter. The first limitation is that there 
must be intensive longitudinal measurement in 
order to estimate parameters of all but the sim- 
plest of differential equations. As the differ- 
ential equation models become more complex, 
the need for intensive measurement per indi- 
vidual increases. Nonlinear models in partic- 
ular can require hundreds or even thousands 
of measurements per individual. While this 
is within the realm of possibility for physio- 
logical data, thousands of questionnaires per 
individual is beyond the scope of behavioral 
research. 

LDE and time-delay embedding can be used 
with as few as five observations per person, 
but unless there are large interindividual dif- 
ferences in regulation or the model is a simple 
first order model, greater power is gained from 
five additional measurements of that individual 
than is gained by adding a new individual with 
five measurements. The reason is that the new 
individual only adds one observation of the 
relationship between the columns, but adding 
five observations to an existing individual adds 
five rows to the time-delay embedding matrix. 
For the same number of measurements, inten- 
sive longitudinal measurement almost always 
will be more powerful than short bursts when 
testing differential equations models. 

Currently, standard errors for LDE models 
estimated parametrically from SEM packages 
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are incorrect. Nonindependence of rows in 
time-delay embedded data matrices violate the 
assumptions on which these standard errors 
are calculated. Structured bootstrap methods 
are currently under development in order to 
address this problem. Until these methods have 
been verified, nested model comparison using 
fit statistics is recommended. 


7 Conclusions 


The methods reviewed and presented in 
this chapter are relatively new in behav- 
ioral research. These methods can be use- 
fully applied, but require more explanation 
than tried and true longitudinal methods. This 
places a burden on researchers reporting results 
from these methods to describe the implica- 
tions of their models in meaningful theoretic 
terms. However, differential equations models 
can lead to insights in behavioral regulation that 
may not be apparent when using other statisti- 
cal techniques. 

Differential equations provide a way to frame 
theories of regulation and time dependency that 
are flexible and result in parameters that have 
useful theoretic interpretations. It can be help- 
ful to think through one’s theory in terms of, 
“What causes change?” and “How does this 
system regulate itself as opposed to being reg- 
ulated by some extrinsic variable?” Framing 
one’s theory in these terms has led naturally 
to specifying and fitting differential equations 
models for emotional regulation, ovarian hor- 
mones coupled to disorder eating, perception- 
action systems in posture, cognitive aging, 
interpersonal communication, and social inter- 
action. It is expected that differential equations 
models will find their way into wide usage in 
the behavioral sciences in the coming decades. 
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| Chapter 38 I 


Nonlinear dynamics, chaos, 
and catastrophe theory 


Courtney Brown 


Nonlinear dynamics refers to a broad range of 
behavior that can occur with many varieties of 
mathematical models that are structured with 
respect to time. Interest in nonlinear dynamics 
has increased in recent years in the social sci- 
ences, in large part fueled by frustrations with 
the limitations associated with linear regression 
models. Nonlinear dynamics can occur with 
dynamic models that are algebraically linear 
or nonlinear. For example, both of the models 
dy/dt = ay and dy/dt = ay’ (where the para- 
meter “a” is a constant) express nonlinear lon- 
gitudinal behavior, yet the first is algebraically 
linear whereas the second is algebraically non- 
linear. Minimally, nonlinear dynamics require 
that change occurs over time with respect to 
one or more variables such that this change can- 
not be represented as a straight line on a graph 
that places time on the horizontal axis. More 
specifically, nonlinear variation is on-going 
change that is not a constant increment with 
respect to time. Chaos and catastrophe theory 
are two subcategories within the area of non- 
linear dynamics that address particular types 
of highly nonlinear behaviors peculiar to cer- 
tain classes of functionally nonlinear dynamic 
models. 


1 Nonlinear dynamics 


The term “nonlinear dynamics” is often used 
within the context of complex situations in 
which phenomena with longitudinal change 
are modeled using nonlinear algebraic formu- 
lations (often involving systems of interde- 
pendent equations) that either implicitly or 
explicitly reference time. However, readers 
should note that linear dynamic models that 
exhibit nonlinear overtime behaviors have his- 
torically played an important role in the devel- 
opment of our contemporary understanding of 
nonlinear dynamics, including the nonlinear 
dynamics associated with algebraically nonlin- 
ear models. The two-nation linear arms race 
model of Lewis Fry Richardson is a prominent 
example of this (Richardson, 1960). A thorough 
mathematical introduction to nonlinear dynam- 
ics as it appears in both linear and nonlin- 
ear mathematical specifications can be found 
in Hirsch, Smale and Devaney (2003; also see 
Hirsch and Smale, 1974). Treatments and exam- 
ples of this same subject matter from a social 
science perspective can be found in Brown 
(2007a, 1995a, 1995b, 1991). Also of interest, a 
seminal work edited by Diana Richards (2000) 
presents a collection of papers representing a 
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large variety of nonlinear applications in the 
social sciences. 

In the physical and natural sciences, non- 
linear dynamics are normally encountered 
using continuous-time differential equation 
model specifications, which is a consequence 
of continuous-time processes of change that 
are naturally encountered in these fields. 
Exceptions are not rare, however, especially in 
the biological sciences, in which generational 
or seasonal changes are the focus of study. In 
instances, difference equations are sometimes 
employed. In the social sciences, nonlinear 
dynamics are often modeled using difference 
equations, almost always due to the manner 
in which social scientific data are collected 
(e.g., periodic elections, decade-spaced cen- 
sus reports, periodically-spaced survey data, 
etc.). Exceptions do occur here as well, and 
the technical difficulties involved in using 
continuous-time models with discretely mea- 
sured data sets as are commonly encountered 
in the social sciences now have clear solutions 
(e.g., see Brown, 1995b). The choice of using 
either a continuous- or a discrete-time approach 
to modeling social phenomena has substantive 
consequences that can be important in certain 
settings, and a discussion of these substantive 
consequences can be found in Brown (1995b, 
pp. 13-30). 

Nonlinear dynamics are normally described 
in terms of the following processes of change: 
(1) regular, (2) periodic, (3) chaotic, and 
(4) catastrophe. A regular process begins with 
a bifurcation, or a point in which a new pro- 
cess of change begins that differs structurally 
from a previous process of change. This bifur- 
cation is normally followed by growth that ini- 
tially experiences positive feedback that later 
changes to negative feedback as the growth pro- 
cess slows. The point at which positive feed- 
back switches to negative feedback is called an 
inflection point, and in simple models this is 
where the first derivative switches in sign from 
positive to negative. The growth eventually 


tapers off entirely as the model asymptotically 
approaches an equilibrium, which is a constant 
steady state. At equilibrium, there is zero net 
change. Because of other processes of change 
that are temporarily external to this regular pro- 
cess, the regular process eventually becomes 
“ripe” for experiencing another bifurcation, 
which can lead to the initiation of a new pro- 
cess of change that can be any of the four listed 
above (including a new regular process). 

A classic example of a regular process involv- 
ing nonlinear dynamics is the logistic equation 
dy/dt = ay(k —y), where growth in the variable 
y continues smoothly as its values asymptoti- 
cally approach the equilibrium value k. This 
model is represented here as Figure 38.1, and 
the equilibrium value in this figure is 1.6. This 
model is functionally nonlinear since it con- 
tains a power term (y*). The nonlinear dynam- 
ics aspect of the model is evidenced by the 
S-shape of the curve over time. 

A periodic process is one in which the values 
of the relevant variables recur at specified inter- 
vals. Periodic processes normally have periodic 
limit cycles, as compared with fixed-point equi- 
libria that are typically associated with regular 
processes. Given the fact that humans live lives 
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Figure 38.1 The logistic curve 
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according to countless cycles, it is surprising 
that periodic influences are not more commonly 
studied in the social sciences (relative to other 
types of analyses using, say, linear regression to 
find correlations between variables). For exam- 
ple, humans go to sleep and wake up perio- 
dically. We go to school in seasons. We eat 
with regular periodicity, breakfast, lunch, 
and dinner. We participate in elections in 
regular intervals. Our censuses are conducted 
in regular intervals. We pay our taxes yearly, 
and on and on. 

A useful example of a periodic process is vot- 
ing for the US Congress. Every two years there 
is a congressional election. But every four years 
there is also a presidential election. Interest 
in the presidential election increases turnout 
for the congressional contest every four years. 
This results in a surge in voter mobilization 
for Congress every four years in the so-called 
“on-year” elections. Subsequent elections two 
years later (the “off-year” elections) experience 
a decline in voter turnout due to the absence 
of a simultaneous presidential contest. If one 
were to model this, one might use a linear 
functional form originally suggested to me by 
John Sprague, 


M,,,=aM,+b (1) 


where M, is proportion of the eligible electorate 
that votes at time ft, and a and b are parameters 
of the model. This is a first-order linear 
difference equation with constant coefficients. 

For equation (1) to demonstrate the type of 
periodicity that is required given the structure 
of the US electoral calendar, parameter a must 
equal —1. That is the only value for that para- 
meter that will produce finite oscillations of the 
type required for this electoral setting. Thus, 
we know the value of parameter a in advance 
without having to estimate it, and the only 
parameter that can be estimated for this model 
is parameter b. If one estimates this model 
using US congressional mobilization data from 


1950 to 1970, then parameter b = 0.9931, and 
R? = 0.68. 

The above model is best interpreted graphi- 
cally. Figure 38.2 is constructed by first using 
the above values for the parameters a and b to 
calculate a predicted time series for M, with 
respect to equation (1). Then these values of 
M, are plotted on top of the actual data. The 
historical data are represented as dots, and the 
model is represented as the saw-toothed line. 

Note in Figure 38.2 that the finite oscilla- 
tions of the model closely correspond with 
the oscillatory characteristic of the data. Note 
also that this saw-toothed line clearly portrays 
the underlying periodic nature of these data. 
Remember that this model is linear in its func- 
tional form, yet it is still capable of express- 
ing nonlinear dynamics of a periodic nature. 
Note also that the data for this representation 
ends in 1970, and this was done by design for 
heuristic reasons. This is because 18—20-year- 
olds were allowed to vote in the US after 1970, 
and these young voters tended to vote in lower 
proportions as compared with older Americans. 
Thus, the variable M, changed in its nature after 
1970. More specifically, M, is a ratio, and after 
1970 the denominator (total eligibles) got larger, 
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but the numerator (total voters) did not grow 
equivalently. This means that the saw-toothed 
pattern found in Figure 38.2 continued after 
1970, but at a lower overall level. In essence, 
this “dropped” or “bent” the pattern a bit after 
1970, and a simple first-order linear difference 
equation model expressing finite oscillations 
could not capture this bend. A more sophis- 
ticated model—potentially a functionally non- 
linear model—would have been able to capture 
this greater level of complexity, however. 

A chaotic process differs from a periodic pro- 
cess in that changes in the values of the relevant 
variables never recur exactly, and this lack of 
repetition is not a consequence of a stochastic 
element. Chaotic processes often exhibit trajec- 
tories with dramatically diverse variations in 
variable values despite very small differences 
in initial conditions. Thus, chaotic processes 
lack periodic limit cycles, although variable 
variations typically “hover” within an iden- 
tifiable neighborhood of an unstable equilib- 
rium (a “strange attractor”), as I explain more 
thoroughly below. 

Catastrophe processes experience sudden 
and dramatic changes that depart from 
previously existing dynamic processes. An 
earthquake is an obvious candidate for a phe- 
nomenon that can be modeled as a catastrophe 
process because incremental (i.e., regular pro- 
cess) changes in the positions of the tectonic 
plates eventually lead to a sudden departure 
from a previously established, pressure-defined 
equilibrium. In essence, a catastrophe process 
is such that the previously dominant equili- 
brium and its basin disappear, and a new equi- 
librium and its basin becomes dominant for 
the dynamical system. Examples of catastrophe 
models using social scientific data can be found 
in Brown (1995a, 1995b). 


2 Competition and cooperation 


Before going into greater depth with regard 
to chaos and catastrophe theories, it is worth 
describing the primary mechanisms by which 


nonlinear dynamics are commonly expressed 
in many models of social and political change. 
Algebraic formulations of social and political 
processes typically reference (either in isola- 
tion or combination) the ideas of competition 
and cooperation (see, e.g., Crosby, 1987). Most 
competition and cooperation formulations are 
expressed in terms of social systems. 

Competitive processes are not goal seeking. 
Individuals and groups act selfishly to pursue 
their own interests in systems that are competi- 
tive in nature. Competitive systems are also 
nonlinear dynamically, in the sense that com- 
peting actors pursue locally-defined interests 
and goals. In a mathematical sense, this refers 
to the values of the partial derivatives of any 
particular system’s Jacobian matrix. Such par- 
tials are always evaluated locally within a phase 
space, and their values typically change dra- 
matically with respect to changes in the values 
of both the parameters and the state variables 
(as when one moves around in phase space). 
But competitive systems are also often (but not 
always) nonlinear in their functional form as 
well (in particular, see Brown, 2007). 

Sometimes competitive systems can be hos- 
tile in nature, such as a nuclear arms race 
that threatens the survival of the planet. In 
other instances, such systems can be univer- 
sally benign such that the overall environment 
within which the competition takes place is not 
harmed by the competition. 

Cooperative systems are fundamentally dif- 
ferent from competitive systems. Cooperative 
systems work to achieve collective goals. Indi- 
vidual actors within the system do not act with- 
out regard for the remainder of society. Thus, 
one can say that cooperative systems are goal 
seeking. In order for cooperative systems to 
achieve collective goals, it is necessary for ele- 
ments of the system to be organized such that 
they are each dependent on one another. This 
results in a level of specialization among the 
various elements within society. Since there 
are collective goals, there must also be some 


Presented by: https:/at wip AyeG?ynamics, chaos, and catastrophe theory 657 


centralization of power, which in turn implies 
some level of hierarchical control. 

How does all of this relate to the subject 
of nonlinear dynamics? Both cooperative and 
competitive systems can experience nonlinear 
dynamics. However, cooperative systems have 
greater potential for experiencing linear dynam- 
ics. This is because it is possible for a cooper- 
ative system to have smooth (i.e., incremental) 
longitudinal change as one of its goals. Since a 
competitive system is locally dominated, non- 
linear dynamics of any level of complexity can 
arise with great regularity. Within the social sci- 
ences, chaos and catastrophes are examples of 
highly nonlinear dynamics that can easily find 
residence within competitive systems. 

Does this imply that cooperative social and 
political systems tend to express change as 
linear dynamics? The answer to this is unequiv- 
ocally no, and this is due to the heterogeneous 
nature of most systems. In real life there are very 
few purely cooperative social systems. Most 
systems are hybrids with both competitive and 
cooperative elements. For example, a collective 
goal of reducing air pollution can be sought by 
society, but we may attempt to obtain this goal 
by instituting a competitive system involving 
“pollution permits” that are allocated to indus- 
tries. Owners of some of the newer factories 
may be able to produce less pollution than their 
allotment, thereby allowing them to sell their 
excess permits to the owners of older factories 
who find it cheaper to buy such permits than 
to renovate their facilities. Such hybrid regu- 
latory practices are called “marketplace” solu- 
tions because they allow factories to compete 
with one another while maintaining an eye on 
the overall level of allowed pollution. 


3 Chaos theory 


Chaos theory refers to a type of behavior 
with nonlinear dynamics that is both irregu- 
lar and oscillatory. The study of chaos the- 
ory has assumed enormous importance in the 
physical and natural sciences, and it is now 


increasingly investigated with respect to the 
social sciences. For example, a recently pub- 
lished seminal volume edited by Kiel and 
Elliott (1996) presented a large variety of papers 
addressing both the theory and application of 
chaos theory in the social sciences. Chaos is 
encountered mathematically with certain sets 
of nonlinear deterministic dynamic models in 
which patterns of overtime behavior are not 
repeated no matter how long the model con- 
tinues to operate. Some discrete-time mod- 
els using nonlinear difference equations are 
known to exhibit chaotic dynamics under cer- 
tain conditions using only one equation. How- 
ever, in continuous-time models, the possibility 
of chaos normally requires a minimum of three 
independent variables, which usually requires 
an interdependent system of three differential 
equations. This requirement for continuous- 
time models can be changed in special cases 
if the system also has a forced oscillator or 
time lags. 

Chaos can occur only in nonlinear situations. 
In multidimensional settings, this means that 
at least one term in one equation must be non- 
linear while also involving several of the vari- 
ables. Since most nonlinear models (and nearly 
all of the substantively interesting ones) have 
no analytical solutions, they must be investi- 
gated using numerically intensive methods that 
require computers (see Brown, 2007). Since 
the dimensionality and nonlinearity require- 
ments of chaos do not guarantee its appear- 
ance, chaotic behavior is typically discovered 
in a model through computational experimen- 
tation that involves finding variable ranges and 
parameter values that cause a model to display 
chaotic properties. 

Chaotic processes are quite common in real 
physical systems, and such processes may also 
be common to social systems, even though 
our present ability to identify and model such 
processes is still developing. Discovering real 
chaotic processes in physical and social sys- 
tems is often quite difficult since stochastic 
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noise is nearly always present as well in 
such systems, and it is not easy to separate 
truly random behavior from chaotic behavior. 
Nonetheless, mathematical tools continue to be 
developed that are aimed at sorting out these 
processes and issues. 

Chaos has three fundamental characteristics. 
They are (a) irregular periodicity, (b) sensiti- 
vity to initial conditions, and (c) a lack of pre- 
dictability. These characteristics interact within 
any one chaotic setting to produce highly com- 
plex nonlinear variable trajectories. Irregular 
periodicity refers to the absence of a repeated 
pattern in the oscillatory movements of the 
chaotically driven variables. Because of the 
irregular periodicity, Fourier analysis, graphing 
techniques, and other methods are commonly 
used to build a case for identifying chaotic pro- 
cesses (see Brown, 1995a). 

A nonlinear model that has been among the 
most well studied with regard to chaos in dis- 
crete settings is a general form of a logistic map, 
and its chaotic properties were initially investi- 
gated by May (1976). This general logistic map 
is Y,,, =aY,(1—Y,). Under the right conditions, 
this map can produce the standard S-shaped 
trajectory that is the trademark of the logistic 
process. However, oscillations in the trajectory 
occur when the value of the parameter a is suf- 
ficiently large. For example, when the value of 
parameter a is set to 2.8, the trajectory of the 
model oscillates around the equilibrium value 
of Y,,, = Y; = Y* while it converges asympto- 
tically toward this equilibrium limit. But when 
the value of the parameter a is set equal to, say, 
4.0, the resulting longitudinal trajectory never 
settles down toward the equilibrium limit, and 
instead continues to oscillate irregularly around 
the equilibrium in what seems to be a ran- 
dom manner that is caused by a deterministic 
process. 

Figure 38.3 is a time series plot of the logis- 
tic map with parameter a set to 2.8. Note how 
the value of the variable Y,,, settles down to 
an equilibrium point. This type of nonlinear 
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Figure 38.3 Time series of logistic map 
without chaos 


behavior is oscillatory and convergent. Another 
way of looking at this behavior is with a stair- 
step diagram, and this is done in Figure 38.4. 
(See Brown, 1995a, for a discussion of how 
stair-step diagrams are constructed and inter- 
preted.) Again note the convergence to the equi- 
librium point as the values of the first iterates 
become equal. 
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Figure 38.4 One-dimensional stair-step diagram of 
logistic map without chaos 
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Setting parameter a= 4, the time series 
plot changes dramatically with the emergence 
of chaos. This is seen in Figure 38.5. Time 
series plots are very hard to interpret in the 
presence of chaos. For this reason, stair-step 
diagrams are much more useful when chaos 
exists. Figure 38.6 is a stair-step diagram of 
the logistic map with parameter a = 4. From 
Figure 38.6 it is clear that the first iterates are 
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Figure 38.5 Time series of logistic map with chaos 
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Figure 38.6 One-dimensional stair-step diagram 
of logistic map with chaos 


not converging, which means that no single sta- 
ble equilibrium exists for the system. Another 
way of saying this is that there is no conver- 
gence (periodic or otherwise) to a steady state. 
The discovery that such a simple model could 
exhibit chaotic properties was revolutionary to 
the subject on nonlinear dynamics. It implies 
that chaos may be quite common in nature, and 
human behavior would not be exempt from this. 

The most famous continuous-time model 
that exhibits chaotic behavior is the so-called 
“Lorenz attractor” (Lorenz, 1963). This model 
is an interdependent nonlinear system involv- 
ing three first-order differential equations, and 
it was originally used to analyze meteorological 
phenomena. The three equations are presented 
here as equations (2), (3), and (4). 


dx/dt = s(y — x) (2) 
dy/dt =rx—y—xz (3) 
dz/dt = xy — bz (4) 


In this system, the three state variables are 
x, y, and z, whereas s, r, and b are parame- 
ters of the model. Most investigations of this 
system hold the parameters s and b constant 
while varying the value of r. The parameter r 
can be any positive number, but when r < 1, 
the origin is globally attracting, which means 
that there are no other competing attractors. 
When r> 1, there are three zero vectors that are 
found by setting the three derivatives in equa- 
tions (2), (3), and (4) equal zero and then solv- 
ing for the resultant state variables. However, 
now the origin is no longer a stable equilibrium 
point. Rather, all nearby trajectories in the sys- 
tem’s three-dimensional phase space now move 
away from the origin. When the parameter r is 
set at certain values, the other two zero vectors 
form what is called a “strange attractor.” The 
appearance of a strange attractor is a character- 
istic of chaos in such settings. In this case, the 
strange attractor forms a basin of attraction that 
draws trajectories into its neighborhood. But 
once trajectories arrive in the neighborhood of 
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the strange attractor, the competing basins asso- 
ciated with each of the non-origin zero vectors 
interact to form an unstable area of phase space. 
In this area, trajectories forever move from one 
basin to the other and then back again, each 
time orbiting one zero vector before moving off 
to re-orbit the other zero vector. Moreover, the 
orbits never settle down into a periodic limit 
cycle. Using values suggested by Lorenz, we 
can set s=10 and b=8/3. Now, if r is set to 
28, then the Lorenz system displays chaos. This 
situation can be seen in Figure 38.7. 

There are many ways to analyze chaos. 
Some methods work with systems of equa- 
tions that produce chaos. Other methods work 
with historical data by trying to discern if 
a deterministic chaotic process is present in 
combination with authentic random noise. 
Since chaos essentially mimics random noise, 
finding chaos in any body of data is a bit 
of a delicate art. Nonetheless, sophisticated 
methods do exist that accomplish just this. A 
review of some of these methods can be found 
in Brown (1995a). A useful discussion on how 
to use Lyapunov exponents while searching 
for chaos in small data sets can be found in 
Rosenstein, Collins and De Luca (1993); see 
also Hilborn (1994). 


Figure 38.7 The strange attractor of the Lorenz 
system 


Nonlinear models with forced oscillators 
are sometimes good candidates for exhibiting 
chaotic or near chaotic (i.e., seemingly ran- 
dom) longitudinal properties. In the social sci- 
ences, such a model has been developed and 
explored by Brown (1995b, Chapter 6). This 
model is a nonlinear system of four interdepen- 
dent differential equations that utilizes a forced 
oscillator with respect to a parameter specify- 
ing alternating partisan control of the White 
House (or other relevant governmental institu- 
tion). The dynamics of the system are investi- 
gated with regard to longitudinal damage to the 
environment, public concern for environmental 
damage, and the cost of cleaning up the envi- 
ronment. Variations in certain parameter values 
yield a variety of both stable and unstable non- 
linear dynamic behaviors, including behaviors 
that have apparent random-like properties typi- 
cally associated with chaos. 


4 Catastrophe theory 


Catastrophe theory refers to a type of behavior 
among some nonlinear dynamic mathematical 
models that experience nonlinear dynamics 
such that sudden or rapid, large-magnitude 
changes in the value of one variable is a con- 
sequence of a small change that occurs in 
the value of a parameter (called a “control 
parameter”). In this sense, catastrophe theory 
can model phenomena that loosely follow a 
“straw that broke the camel’s back” scenario, 
although catastrophe theory can also be very 
general in its application. The modern under- 
standing of catastrophe theory has its genesis in 
work by Thom (1975). 

Nearly all early work with catastrophe theory 
employed polynomial functions in the spec- 
ification of differential-equation mathematical 
models. In part, this was an important conse- 
quence of the generality of Thom’s findings. 
Because all sufficiently smooth functions can 
be expanded using a Taylor series approxima- 
tion (which leads us to a polynomial represen- 
tation of the original model), it is possible to 
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analyze the polynomial representation directly 
(see Saunders, 1980, p. 20). However, scientists 
can avoid using one of Thom’s canonical poly- 
nomial forms by working with their own orig- 
inal theory-rich specifications as long as it is 
clear that the original specification has catas- 
trophe potential (Brown, 1995a). 

Bifurcations are fundamental to catastrophe 
theory. A bifurcation is an event that occurs in 
the evolution of a dynamic system in which the 
characteristic behavior of the system is trans- 
formed. With catastrophe theory, this occurs 
when an attractor in the system changes in 
response to change in the value of a parameter 
(called a “control parameter,” because its value 
controls the manifestation of the catastrophe). 
A catastrophe is one possible consequence of a 
bifurcation (as compared with phenomena asso- 
ciated with, say, subtle bifurcations or explo- 
sive bifurcations). 

The characteristic behavior of a dynamic sys- 
tem is determined by the behavior of trajecto- 
ries, which are the values of the variables in a 
system as they change over time. When trajecto- 
ries intersect with a bifurcation, they typically 
assume a radically different type of behavior 
as compared with that which occurred prior 
to the impact with the bifurcation. Thus, if a 
trajectory is “hugging” close to an attractor or 
equilibrium point in a system and then inter- 
sects with a bifurcation point, the trajectory 
may suddenly abandon the previous attractor 
and “fly” into the neighborhood of a different 
attractor. The fundamental characteristic of a 
catastrophe is the sudden disappearance of the 
influence of one attractor and its basin, com- 
bined with the dominant emergence of another 
attractor. Because multidimensional surfaces 
can also attract (together with attracting points 
on these surfaces), these gravity centers within 
dynamical systems are referenced more gener- 
ally as attracting hypersurfaces, limit sets, or 
simply attractors. 

A well-known example of the catastro- 
phe specification that can easily be adapted 


to applications in the social sciences is 
the “spruce budworm problem.” The model 
addresses the budworm that eats the foliage 
of balsam fir trees, and a full description of 
this model can be found in Ludwig, Jones and 
Holling (1978). The basic idea is that the bud- 
worm population grows at a logistic rate up to 
some equilibrium level that is determined by 
the limits of the food supply. But the specifica- 
tion can be adapted to include a predation term 
that allows for a sudden change in the budworm 
population. This predation often takes the form 
of an avian consumer of budworms. 

To make this discussion more general, we 
can drop the budworm application entirely and 
consider the growth of any variable that has 
a logistic component combined with a loss 
term that contains the algebraic specification 
of predation as suggested above. Indeed, this 
has been done with respect to modeling mar- 
riages by Gottman, Murray, Swanson, Tyson 
and Swanson (2002, pp. 74-89). Such a model 
can be written as equations (5) and (6). 


dN/dt = rN(k— N) — p(N) (5) 
where, 
p(N) = BN?/(A+N?) (6) 


In equation (5), the first term on the right- 
hand side expresses logistic growth of the pop- 
ulation N. The second term expresses loss to 
this population, which is more fully specified 
in equation (6). From equation (6), in the limit, 
P(N) grows to B/A. But equation (6) acts as a 
switch since it allows loss (e.g., predation) to 
occur rapidly at values of N that approximately 
equal VA. Substituting equation (6) into equa- 
tion (5), and then setting equation (5) equal to 
zero allows one to solve for the steady states of 
the model (i.e., the equilibria, N*). But this for- 
mulation has a cubic term in N, which results 
in three equilibria for certain combinations of 
the parameter values. In such situations, one of 
the equilibria is unstable, while the other two 
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are stable. The population dynamics are such 
that the value of N can be drawn toward one of 
the two stable equilibria while the value of one 
of the parameters (called a “control parameter” 
since its value can vary incrementally) changes. 
At some point, the ability of the first of the 
stable equilibria to continue to “hold on” (i.e., 
continue to influence) the value of N vanishes, 
and N is quickly captured within the basin of 
the other attracting equilibrium. This can pro- 
duce a “cusp” catastrophe. 

Another model, this time due to Zeeman 
(1972), illustrates a simple cusp catastrophe 
with yet a different specification. This model 
is used to describe the change in muscle fiber 
length (variable x) in a beating heart. The 
control parameter A (which in this instance 
refers to the electrochemical activity that ulti- 
mately instructs the heart when to beat) can 
change in its value continuously, and it is 
used to move trajectories across an equilib- 
rium hypersurface that has catastrophe poten- 
tial. The parameter q identifies the overall 
tension in the system, and f is a scaling 
parameter. The two differential equations in 
this system are dx/dt=—f(x*’—qx+A), and 
dA/dt =x —x,. Here, x, represents the mus- 
cle fiber length at systole (the contracted heart 
equilibrium). Setting the derivative dx/dt =0, 
we will find between one and three values 
for x, depending on the other values of the 
system. When there are three equilibria for 
x for a given value of the control parameter 
A, one of the equilibria is unstable and does 
not attract any trajectory. The other two equi- 
libria compete for the attention of the sur- 
rounding trajectories, and when a trajectory 
passes a bifurcation point in the system, the 
trajectory abandons one of these equilibria and 
quickly repositions itself into the neighbor- 
hood (i.e., the basin) of the other equilibrium. 
This rapid repositioning of the trajectory is the 
catastrophe. 

A sample picture of a catastrophe surface is 
shown in Figure 38.8. In this figure, note how 


Initial conditions 


Initial 
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i ‘ 
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surface 


Initial conditions 
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Figure 38.8 A sample “cusp” catastrophe 
equilibrium surface with associated variable 
dynamics 


the trajectories move to one of the two stable 
equilibria in the upper left or lower right of the 
graph. Note also that when a trajectory moves 
over the lip of the cusp equilibrium surface, it 
experiences a rapid vertical shift as it is drawn 
toward the other stable equilibrium. This rapid 
vertical shift is the classic signature of the catas- 
trophe phenomenon. 

Two social scientific examples of nonlin- 
ear differential equation dynamic catastro- 
phe specifications have been developed and 
explored by Brown (1995b, Chapters 3 and 5). 
One example involves the interaction between 
candidate preferences, feelings for a_polit- 
ical party, and the quality of an individ- 
ual’s political context or milieu during the 
1980 presidential election in the United States. 
The other example addresses the interaction 
between the partisan fragmentation of the 
Weimar Republic’s electorate, electoral de- 
institutionalization, and support for the Nazi 
party. Both examples are fully estimated; the 
first uses both individual and aggregate data 
while the second employs aggregate data only. 
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5 The future of nonlinear modeling 
in the social sciences 


It is highly probable that the study of non- 
linear dynamics—and in particular the use of 
functionally nonlinear continuous and discrete- 
time mathematical models—will continue to 
grow in the social sciences. Minimally, the 
limitations of the linear regression model will 
almost certainly force social scientists to con- 
tinue to expand their use of mathematical 
techniques into areas that have traditionally 
been exploited more heavily in the natural and 
physical sciences. It is not that social scientists 
should stop using linear models. It is just that 
growth and exploration are natural to the sci- 
entific enterprise, and research into nonlinea- 
rity promises to be an area of high yield in 
terms of scientific productivity in the social 
sciences. Also, new languages of mathemati- 
cal modeling—such as graph algebra (Brown, 
2007b)—are likely to ease the mechanics of 
nonlinear model building just as the use of 
such languages have similarly assisted the field 
of engineering. It is difficult to predict with 
certainty how any field will develop. But it 
seems highly likely that the drive to under- 
stand the complexity of human societies will 
lead to the exploitation of mathematical models 
that similarly address greater levels complexity. 
The study of nonlinearity is one of a number 
of competing pathways to this richer level of 
understanding. 


Glossary 


Bifurcation A point at which there is a quali- 
tative change in the longitudinal behavior of a 
dynamic system or variable. 


Catastrophe theory A theory originally devel- 
oped by the French mathematician, René Thom, 
that describes sudden and large magnitude 
change in one variable as a consequence of 
incremental change in the value of a control 
parameter. 


Chaos A branch of mathematics in which 
change in the value of one or more variables ina 
deterministic model appears random in nature, 
yielding great sensitivity to initial conditions. 


Nonlinear dynamics Longitudinal change 
that is not defined in terms of a constant incre- 
ment over time, usually associated with highly 
nonlinear dynamic model specifications. 


Phase space The dimensions housing change 
in a system’s dependent variables such that 
time (an independent variable) is suppressed. 


Trajectory Sequential change in one or more 
dependent variables within a system, usually 
within the context of phase space. 
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Subject-specific model (see Unit-specific model; prewhitening 596-7, 624-6 
Random effects) random shock 580, 583-4 
Survey of Income and Program Participation (SIPP) spectral analysis 11, 601-619 
127, 169-71 stationarity 239, 579-80 


Survey research 33-48, 49-65, 67-84, 168-73 


: : J ; time series regression 343, 347-8, 621-3 
Survival analysis (see Event history analysis) 


transfer function 595-9 
vector autoregression 595-7, 599 
Time-varying predictors/covariates 10, 
373-9, 405, 423, 425, 444, 447, 459-60, 
519, 548, 560-1, 568, 570, 575, 579-80, 599 
Total population design 4 
Transitions, analysis of 328-9, 373-84, 389-403 
Tracing (tracking) respondents 174-7 
Trends 23-4, 43-4, 239, 242-4, 322-3 
(see also Time series) 


Telescoping 52, 111, 129 
Temporal order 3, 49 
Time-ordered cross-sectional design 3 
Time plots 200-204 
Time series: 4, 11, 16, 23-4, 153, 233-48, 250, 323, 
354, 475, 486, 643-8, 655-60 
analysis 11, 579-600, 601-619, 621-637 
ARIMA (Autoregressive integrated moving 
average) 579-600, 624-8 
autoregressive error models 238, 250-6, 286-8, 
594-6, 599, 622-4, 630-33 
autoregressive (AR) model 242, 579, 


Unconditional change model 508-511, 514—7 
Uniform Crime Reports (UCR) 4 


584-94, 641-3 Unit-specific model (see also Random effects) 246, 
comparison of models 621-637 290, 469-70, 483-7, 499, 511-3, 565, 569, 572-4 
cross-spectrum and bivariate spectral analysis Unobserved heterogeneity 478-479, 499-501 

models 615-8 
differencing (see Time series: integrated Working correlation matrix 470-4 

(J) models) Wu-Hausman test (see Hausman test) 
Dickey—Fuller tests 239-40, 585 
Durbin—Watson test 630-31 Youth Attitude Tracking Survey (YATS) 131-2 
forecasting 597-8 Youth In Transition study 266 
integrated (I) models 584—5, 625, 636-7 
intervention (see Time series: interrupted) Zentrum fiir Umfragen, methoden, und Analysen 


interrupted 593-4, 626-36 (ZUMA) 35, 87 


